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ABSTRACT 


Fast,  ultra- reliable,  real-time  computing  is  fundamental  in  today’s  weapons  sys¬ 
tem.  Incre2ised  system  throughput  and  reliability  can  be  achieved  by  utilizing  dis¬ 
tributed  systems  in  which  a  single  application  program  executes  on  multiple  proces¬ 
sors,  connected  to  a  network.  The  distributed  nature  of  such  systems  make  it  possible 
to  tolerate  failures  and  react  to  overloads  without  the  application  level  performance 
degrading  unacceptably.  Fault  tolerance  in  these  systems  typically  involves  fault 
detection  and  recovery.  Repair  following  failure  involves  smooth  integration  of  the 
repaired  processor  and  subsequent  reconfiguration.  These  actions  must  take  place 
transparently,  that  is  without  the  application  program  noticing  it.  Therefore,  suffi¬ 
cient  information  must  be  maintained  through  the  use  of  checkpointing  to  describe 
the  state  of  the  system  at  any  time  and  ensure  correct  operation  after  failure/repair. 

This  thesis  investigates  a  possible  framework  for  achieving  a  fault-  toleraint  real¬ 
time  distributed  system  which  provides  transparent  function-to-function  message 
passing,  status  monitoring  using  periodic  health  messages  and  maintains  a  glob¬ 
ally  consistent  system  state  by  carrying  out  independent  checkpointing  procedures. 
The  proposed  scheme  is  simulated  using  concurrent  Ada  processing  for  a  four  node, 
twelve  function,  distributed  system. 
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I.  INTRODUCTION 


A.  GENERAL 

Distributed  systems  have  become  increasingly  popular  in  satisfying  the  require¬ 
ments  for  increased  computing  power  and  also  as  a  means  of  achieving  fault  tolerance 
in  critical  real-time  systems  [Ref.  1].  Distributed  systems  are  often  defined  to  en¬ 
compass  a  wide  range  of  loosely  coupled  computer  systems,  especially  network  based 
systems.  In  loosely  coupled  distributed  systems,  there  are  no  shared  resources;  there¬ 
fore,  all  information  exchanged  between  the  relocatable  functions  must  occur  via 
message  passing  [Ref.  2].  As  the  processing  speed  of  system  nodes  and  the  trans¬ 
mission  capacity  of  message  transfer  media  increase  due  to  technological  advances, 
message  transmission  time  becomes  small  enough  to  provide  a  resource  management 
that  makes  the  distributed  nature  of  the  system  transparent  to  the  user.  This  resource 
management  must  maintain  continuity  of  processing  information  for  dynamically  re¬ 
located  functions  and  therefore,  requires  the  system  state  information  to  be  globally 
consistent  [Ref.  3].  This  state  consists  of  the  information  necessary  to  describe  the 
characteristics  of  all  system  nodes  and  functions.  In  order  to  maintain  global  con¬ 
sistency,  some  method  of  checkpoint  and  rollback  procedures  must  be  utilized.  A 
checkpoint  is  a  saved  local  state  of  a  node’s  active  functions[Ref.  4].  A  set  of  check¬ 
points,  one  per  node,  is  consistent  if  the  saved  states  form  a  consistent  global  state. 
Rollback  is  defined  as  the  retransmission  of  messages  from  the  last  checkpoint  in  order 
to  restart  the  system  after  node  failure. 

Two  approaches  to  node  recovery  and  function  reconfiguration  are  replicated 
execution  and  local  checkpointing,  coupled  with  rollback,  to  build  a  consistent  global 
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state.  The  problems  of  keeping  replicas  consistent  in  the  former  are  formidable  [Ref. 
5].  Also,  the  number  of  node  failures  which  can  be  tolerated  must  be  known  a 
priori  in  order  to  determine  the  requisite  number  of  replications.  In  the  absence  of 
synchronization,  functions  cannot  all  recover  simultaneously.  Recovering  functions 
asynchronously  can  introduce  situations  in  which  a  single  failure  can  cause  an  infinite 
number  of  rollbacks,  preventing  system  progress.  Local  checkpointing  may  result  in 
a  rollback  whose  completion  time  can  vary  considerably;  therefore,  it  is  unsuitable  to 
mission  critical  environments  [Ref.  6]. 

The  proposed  framework  for  a  distributed  system  utilizes  the  replication  of  code 
at  each  node  and  maintains  a  global  snapshot  of  the  system  state.  This  framework 
minimizes  recovery  time,  making  it  unnecessary  to  use  rollback  procedures  during 
migration,  except  in  cases  of  node  failure. 

B.  AIM  OF  THE  STUDY 

The  objective  of  this  thesis  is  to  implement  the  framework  necessary  to  provide 
transparent  function-to-function  message  pjissing,  fault  detection  and  checkpointing 
in  a  robust,  real-time  distributed  system.  Robustness  is  the  system’s  ability  to  with¬ 
stand  failures  and  utilize  reconfiguration  to  minimize  the  impact  of  these  failures  on 
overall  system  performance.  Distribution  requires  the  partitioning  of  an  application 
program  into  multiple  functions,  the  code  for  which  is  resident  at  every  node.  How¬ 
ever,  the  responsibility  for  execution  of  a  particular  function  is  assigned  to  only  one 
node  in  this  framework.  This  function  assignment  may  be  fixed  at  initialization  or 
may  change  Jis  a  result  of  reconfiguration.  Communication  between  these  dynamically 
relocatable  functions  is  via  a  globally  ordered  network.  This  loosely  coupled  system 
does  not  share  any  resources,  as  illustrated  in  Figure  1.1,  which  is  reproduced  from 
another  document  [Ref.  7]. 
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Network  Communication  Layer  (NCL) 


Figure  1.1:  A  Loosely  Coupled  Distributed  System 

The  scope  of  this  thesis  is  to  implement  the  means  necessary  to  provide  fault 
tolerance  and  maintain  the  required  information  to  allow  a  rapid  system  reconfigura¬ 
tion. 


C.  METHOD  OF  APPROACH 

This  thesis  focuses  on  a  single  application  executing  on  a  distributed  system. 
A  layered  architecture  was  chosen  to  organize  the  different  components  in  an  easy  to 
manage,  hierarchical  fashion.  The  layers  operate  concurrently,  yet  interface  to  main¬ 
tain  communication  between  dynamically  relocatable  functions.  This  enables  fault 
tolerance  and  load  balancing  efforts  to  proceed  independently  without  interruption 
of  the  actual  application  processing. 

Fault  tolerance  is  accomplished  by  requiring  each  node  in  the  system  to  peri¬ 
odically  broadcast  its  load.  Receipt  of  these  status  messages  does  not  only  indicate 
that  the  node  is  operational,  but  the  load  information  is  also  utilized  in  the  recon¬ 
figuration  algorithms.  These  algorithms  require  globally  consistent  data  upon  which 
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AL 


Figure  1.2:  Software  Layer  Configuration  at  Each  Node 

to  base  their  decisions.  The  globally  consistent  state  information  is  maintained  at 
each  node  through  the  use  of  independent  checkpointing  procedures.  A  system  node 
containing  four  independent  software  layers  and  interna]  communication  paths  indi¬ 
cated  by  arcs,  is  depicted  in  Figure  1.2,  which  is  reproduced  from  another  document 
[Ref.  7].  The  Network  Communication  Layer  (NCL)  must  be  a  globally  ordered 
communications  protocol  which  enables  the  broadcast  of  all  messages.  The  Location 
Invariant  Function  to  Function  Communication  Layer  (LIFFCL)  provides  each  node 
with  the  necessary  communications  interface  to  the  NCL,  implements  fault  tolerance 
and  checkpointing  procedures.  The  LIFFCL  is  the  major  emphasis  of  this  thesis  and  is 
covered  extensively  in  Chapters  III  and  IV.  The  Reconfiguration  Layer  (RL)  handles 
function  allocation/reconfiguration  and  is  covered  in  detail  in  [Ref.  8].  The  Applica¬ 
tions  Layer  (AL)  conducts  actual  application  program  execution  and  is  responsible 
for  the  message  queue  management  of  all  active  functions  at  a  node.  Specification  of 
AL  functionality  is  to  be  covered  in  future  thesis  topics. 
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D.  ORGANIZATION 


This  thesis  is  organized  as  follows.  Chapter  II  discusses  the  issues  in  a  dis¬ 
tributed  system  and  the  mechanisms  necessary  to  address  these  issues.  Chapter  III 
discusses  the  means  of  achieving  function  to  function  communications,  fault  tolerance, 
and  maintaining  state  information.  The  detailed  action  of  the  tasks  within  the  LIF- 
FCL  of  an  individual  node  is  illustrated  in  the  state  diagrams  shown  in  Chapter  IV. 
An  overview  of  the  implementation  software  and  the  simulation  results  are  contained 
in  Chapter  V.  Chapter  VI  contains  the  conclusion. 
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II.  ISSUES  IN  MAINTAINING  THE  SYSTEM 

STATE 

A.  GENERAL 

As  indicated  previously,  the  state  of  a  distributed  system  entails  all  the  variables 
necessary  to  describe  any  or  all  of  the  system  components  at  any  point  in  time.  The 
distributed  nature  of  such  a  system  requires  this  state  information  to  be  current  and 
accessible  by  all  nodes.  The  integrity  of  this  data  must  be  maintained  in  order  to 
implement  fault  tolerant  procedures  which  enable  continuity  of  a  function’s  process¬ 
ing  regardless  of  its  location.  To  prevent  the  loss  of  state  of  the  functions  running 
on  a  node  when  the  node  fails,  the  system  state  must  be  periodically  updated  and 
distributed  to  all  nodes  utilizing  checkpointing  procedures,  eis  stated  in  Chapter  I. 
This  globally  consistent  state  information  is  required  by  reconfiguration  algorithms  in 
making  relocation  decisions.  These  algorithms  are  covered  in  another  thesis  [Ref.  8]. 
Issues  requiring  the  use  of  a  system’s  state  information  are  described  in  the  following 
sections. 

B.  ALLOCATION 

Allocation  is  achieved  at  compile  time  or  during  execution.  If  conducted  during 
execution,  it  requires  knowledge  of  the  current  system  state  information  obtained 
during  checkpointing. 

C.  MAINTAINING  STATE  OF  FUNCTIONS 

As  stated  earlier,  reconfiguration  efforts  require  a  globally  consistent  restart 
point.  This  restart  point  is  determined  by  storing  a  function’s  unique  variables  at  each 
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node  during  checkpointing.  In  order  to  describe  the  state  of  a  function,  some  of  the 
attributes  that  must  be  known  about  a  function  are  the  last  message  received,  the  last 
message  processed,  time  remaining  till  completion,  time  remaining  till  deadline,  all 
symbol  variables,  and  general  register  contents,  etc.  When  a  function  gets  processing 
time  at  a  node,  these  statistics  are  updated  and  stored  for  that  function.  Keeping 
the  state  of  every  function  at  every  node  prevents  retransmission  of  messages  if  the 
node  where  the  function  was  active  fails  or  cannot  complete  the  function  on  time. 
Another  node  can  activate  the  function  and  maintain  continuity  of  processing  rather 
than  restarting  the  function  at  the  last  checkpoint.  Each  node  maintains  a  unique 
section  for  the  data  relevant  to  its  active  functions.  All  nodes  share  this  data  by 
passing  other  nodes  their  unique  section  during  checkpoint  procedures  as  described 
in  Chapter  I.  This  allows  for  ease  of  transportability  of  functions  and  minimizes  the 
communications  required  for  this  migration. 

D.  MAINTAINING  STATUS  OF  NODES 

Another  factor  in  reconfiguring  a  system  is  the  operational  status  of  all  nodes. 
This  status  is  maintained  through  health  monitoring  schemes  which  depend  totally 
on  the  exchange  of  status  messages.  Detection  of  node  failure  must  result  in  the 
migration  of  the  assigned  functions  to  active  nodes.  Knowledge  of  each  node’s  status 
prevents  assigning  a  function  to  a  non-active  node. 

In  conjunction  with  the  status  of  a  node,  its  current  load  is  also  important. 
Knowledge  of  every  node’s  loading  percentage  may  prevent  a  node  from  becoming 
overloaded  and  resulting  in  functions  not  being  completed  on  time.  If  a  node  is 
fully  loaded,  transferring  a  function  to  it  only  overloads  the  node.  This  causes  a 
degradation  not  only  to  the  individual  node  but  the  entire  system  since  unnecessary 
communication  is  required  by  the  now  overloaded  node  in  an  effort  to  migrate  a 
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function  to  reduce  loading.  By  keeping  track  of  a  node’s  status  and  load,  appropriate 
decisions  can  be  made  when  reconfiguration  is  necessary. 

E.  ROUTING 

A  function’s  location  must  be  known  at  all  times  if  a  system  is  to  support 
function  to  function  communication  through  the  use  of  data  messages.  Nodes  must 
maintain  a  queue  for  each  function  in  order  to  store  all  data  messages  destined  for  a 
particular  function.  The  active  function  queues  are  maintained  in  the  AL,  and  the 
non-active  function  queues  are  maintained  in  the  LIFFCL.  Requiring  each  node  to 
maintain  function  queues,  minimizes  the  amount  of  traffic  to  be  transferred  during 
migration  of  functions.  This  prevents  rollback  during  reconfiguration,  except  in  the 
case  of  node  failure.  Checkpointing  and  fault  detection  schemes  provide  the  means 
to  update  the  variables  necessary  to  describe  the  global  state  of  the  system,  as  in¬ 
dicated  above.  These  variables  are  maintained  in  a  resource  called  the  node  status 
table  (NST),  constructed  at  each  node,  as  shown  in  Figure  2.1,  which  is  reproduced 
from  another  document  [Ref.  7].  The  NSTs  are  maintained  consistent  through  the 
exchange  of  node  status  messages,  as  well  as  marArer  messages  during  checkpoint.  The 
composition  of  the  NST  is  detailed  in  the  following  section. 

F.  NODE  STATUS  TABLE 

The  NST  is  comprised  of  three  sections:  a  section  containing  status  information 
that  is  common  to  all  nodes,  a  section  containing  all  the  information  unique  to  the 
functions  that  are  active  on  each  node,  and  the  node’s  identity.  A  given  node  contains 
two  complete  copies  of  the  NST;  the  duplicate  copy  being  designated  node  status 
backup  (NSTBAK).  Duplication  of  data  guards  against  loss  of  information  as  a  result 
of  node  failure  during  checkpointing.  The  NST  contains  variables  which  are  used  to 
describe  the  health  of  all  nodes,  the  state  of  all  functions,  and  the  events  since  the 
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COMMON  SECTION 


IMC 

FN_LOC 

NC»E_STATJ-D 


UNIQUE  SECTION 


NI 

bil 

fiinctkai  wiables 

fii2 

N2 

• 

. 

• 

Nn 

• 

Ink 

• 

NODE  ID 


Figure  2.1:  Node  Status  Table 

last  checkpoint. 

1.  Common  Section 

The  node  status  indicates  if  a  node  is  up  or  down.  This  information  is 
updated  through  the  use  of  status  messages  transmitted  periodically  by  each  node.  If 
a  periodic  status  message  is  not  received  from  a  node  within  a  specified  time  interval, 
the  node  is  assumed  to  have  failed  and  is  logged  down. 

2.  Unique  Section 

The  unique  section  contains  the  current  state  information  for  all  functions 
within  the  system.  It  consists  of  a  subsection  for  each  system  node,  with  the  sub¬ 
sections  containing  separate  records  for  those  functions  assigned  to  the  appropriate 
node.  The  functions’  state  information  is  obtained  during  checkpointing  by  each  node 
exchanging  the  applicable  unique  subsections  of  their  NST. 

Each  node  records  and  saves  all  messages  sent  between  any  two  checkpoints. 
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All  messages  are  contained  in  one  of  three  places  at  a  given  node.  The  active  queue 
in  the  AL  contains  messages  for  all  functions  assigned  to  the  node  and  the  non-active 
queue  in  the  LIFFCL  contains  the  messages  for  all  remaining  system  functions.  Also 
messages  not  yet  transmitted  or  received  by  the  node  are  in  the  Output  Server 
or  Input  Server  queues  respectively.  When  a  function  is  migrated,  the  receiving 
node  utilizes  the  messages  from  the  non-active  queue  within  its  LIFFCL  to  update  the 
active  queue  for  the  activated  function.  Any  messages  in  the  output /input  queues  are 
not  be  affected  by  the  migration  process.  However,  if  a  node  fails,  its  current  unique 
section  is  not  accessible  to  the  new  node  and  any  messages  in  its  output/input  queues 
are  lost;  therefore,  a  rollback  is  necessary. 

3.  Node  Identification 

NODEJD  is  self-explanatory.  Several  of  the  algorithms  within  the  LIFFCL 
and  RL  use  this  variable  to  determine  the  identity  of  the  node  since  all  nodes  are 
running  concurrently.  Specifics  on  the  use  of  NODEJD  can  be  found  in  the  program 
located  in  Appendix  A. 

4.  Local  Variables 

In  addition  to  the  NST,  each  node  maintains  local  variables  used  for  node 
recovery,  checkpointing,  and  queue  management.  These  variables  are  explained  in 
detail  in  the  following  sections. 

a.  Recovery  Variables 

The  recovery  variables  are  utilized  by  the  recovering  node  to  indi¬ 
cate  when  it  is  ready  to  commence  normal  processing.  These  variables  are  utilized 
to  prevent  unnecessary  communication  between  the  recovering  and  active  nodes  as 
explained  below. 

Recovery  in  Progress  (RCVRYJN_PROG)  is  the  variable  which  in- 
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dicates  that  a  recovery  is  taking  place.  It  prevents  another  periodic  message  from 
retriggering  the  recovery  process.  Retriggering  the  recovery  process  could  put  the 
nodes  in  an  infinite  loop.  In  this  case,  recovery  of  a  node  can  never  be  completed. 
Recovery  (RCVY)  is  used  to  indicate  when  a  node  has  completed  recovery.  In  order 
to  recover,  a  node  must  rebuild  its  NST.  This  is  accomplished  by  each  of  the  other 
nodes  sending  the  common  and  unique  sections  of  their  NST.  Each  element  of  RCVY 
indicates  whether  the  corresponding  node  has  sent  its  unique  and  common  sections 
of  the  NST  to  the  recovering  node.  Once  completion  of  recovery  is  detected,  the 
node  clears  the  RCVY  array  and  resets  RCVRYJN_PROG  to  false.  Unique  Sent 
(UNIQ_SENT)  is  utilized  by  the  active  nodes  to  indicate  that  a  node  hais  responded 
to  a  recovery  operation  by  sending  its  NST  sections.  Once  complete  recovery  is  de¬ 
tected,  the  nodes  reset  this  variable.  UNIQ_SENT  prevents  additional  messages  from 
being  generated. 

b.  Checkpoint  Variables 

The  checkpoint  variables  are  utilized  when  updating  the  global  state 
of  the  system.  Checkpoint  Taken  (CHKPT.TAKEN)  is  utilized  to  indicate  when  a 
marker  message  has  been  received  from  all  active  nodes.  A  marker  message  is  sent 
by  a  node  which  heis  conducted  a  local  checkpoint.  CHKPT.TAKEN  is  used  by 
the  checkpoint  originator  to  indicate  when  a  checkpoint  is  complete.  Event  Count 
Out  (EVNT.CNT.OUT)  keeps  track  of  the  number  of  messages  that  are  sent  to 
the  network.  This  is  only  used  to  track  messages  in  the  output  files  created  by  the 
simulation  program. 

c.  Queue  Management 

Queue  management  variables  are  required  to  ensure  the  integrity  of 
all  messages  at  a  given  node.  This  is  particularly  important  when  dealing  with  cir¬ 
cular  queues.  Messages  can  be  written  over  easily  if  pointers  are  not  maintained 
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properly.  For  this  reason,  several  variables  are  maintained  for  management  of  the 
queues.  MSG_TO_SEND  is  used  to  indicate  that  there  are  messages  in  the  queue  to 
send.  BLOCK_WRITE  is  used  to  prevent  overwriting  a  message  in  the  queue  that 
has  not  been  read.  RD-CNT  is  used  as  a  pointer  to  the  next  message  to  be  read. 
MSG-CNT  is  used  as  a  pointer  to  the  next  available  queue  slot  into  which  a  message 
can  be  written. 

G.  SUMMARY 

The  status  of  each  node  and  the  current  statistics  of  each  function  must  be 
maintained  in  the  NST  in  order  to  describe  the  global  state  of  the  distributed  system. 
Although  maintaining  the  variables  of  the  NST  requires  the  overhead  incurred  with 
checkpointing  procedures,  the  time  spent  is  more  than  compensated  for  by  quicker 
fault  detection  and  faster  and  more  efficient  reconfiguration  algorithms.  The  check¬ 
pointing  and  fault  detection  algorithms  utilized  to  maintain  the  NST  are  covered  in 
the  following  chapters. 
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III.  THE  LOCATION  INVARIANT  FUNCTION 
TO  FUNCTION  COMMUNICATION  LAYER 


A.  GENERAL 

This  chapter  examines  the  Location  Invariant  Function  to  Function  Communi¬ 
cation  Layer  (LIFFCL),  its  components,  and  their  interface  with  the  other  layers  of 
the  node.  The  LIFFCL  accomplishes  three  distinct  objectives  within  the  node.  The 
first  objective  is  to  provide  the  node  a  communication  interface  with  the  NCL,  in 
order  to  support  communication  between  the  system  functions.  Secondly,  it  performs 
fault  detection  by  monitoring  the  health  of  all  system  nodes.  It  also  generates  periodic 
health  {status)  messages  to  inform  other  nodes  of  its  own  status.  Lastly,  the  LIFFCL 
implements  checkpoint  procedures  which  are  utilized  to  develop  globally  consistent 
system  states. 

The  LIFFCL  is  comprised  of  four  specific  components:  Input  Server  (IS), 
Output  Server  (OS),  Status  Monitor  (SM),  and  Checkpoint  (CP).  The  it  pro¬ 
vides  communication  interface  with  the  NCL,  via  Output  Server  and  Input  Server. 
Status  Monitor  provides  fault  detection  and  Checkpoint  monitors  the  occurrence 
of  events  at  a  given  node  and  implements  checkpointing.  All  of  the  components  of 
this  layer  shown  in  Figure  1.2  are  covered  in  detail  in  the  following  sections  of  this 
chapter.  The  logical  progression  of  events  for  a  particular  task  at  a  given  node  are 
illustrated  in  Chapter  IV,  utilizing  state  diagrams. 

B.  INPUT  SERVER 

The  Input  Server  is  responsible  for  receiving  message  traffic  from  the  com¬ 
munication  layer  and  redirecting  messages  to  tasks  within  the  node  for  the  required 
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action.  It  parses  the  message  to  determine  its  type  and  the  destination  task  to  com¬ 
plete  the  necessary  action.  It  is  a  process  that  is  activated  periodically.  It  is  during 
this  activation  time  quantum  that  a  node  actually  receives  messages.  Therefore,  a 
queue  is  utilized,  in  which  the  NCL  places  messages.  Queue  management  variables  are 
utilized  to  indicate  overflow  and  underflow  conditions,  as  well  as  maintain  message  or¬ 
dering  within  the  queue.  The  Input  Server  consists  of  two  tasks.  Node  Initializer 
and  Receive  Msg.  It  is  initially  given  its  node  identification  via  a  rendezvous  call 
to  task  Node  Initializer.  Thereafter,  Input  Server  is  activated  periodically  by 
the  expiration  of  a  delay  statement  within  the  Receive  Msg  task.  The  duration  of 
this  delay  is  a  parameter  which  can  be  changed  in  relation  to  the  periodicity  of  the 
NCL  delay,  in  order  to  analyze  the  affects  on  system  throughput.  The  NCL  delay 
determines  the  rate  at  which  messages  are  sent  to  the  Input  Server.  The  Input 
Server  maintains  a  circular  queue  which  is  written  into  by  the  NCL.  The  boolean 
variable  BLOCK.WRITE  is  set  to  prevent  the  NCL  from  writing  over  a  message 
that  has  not  yet  been  read  by  the  Input  Server.  When  the  NCL  has  a  message 
to  send,  if  BLOCK.WRITE  is  false,  it  places  the  message  into  the  next  available 
slot  of  the  Input  Server  queue  and  sets  MSG_TO.SEND  to  true.  Upon  detecting 
MSG-TO.SEND,  the  Input  Server  parses  the  MSG.KIND  field  to  determine  if  the 
message  is  a  data  or  control  type.  Data  messages  is  sent  to  tasks  within  the  AL,  or  to 
the  function  queue  manager  task  of  the  LIFFCL.  Control  messages  are  sent  to  tasks 
within  the  RL  or  LIFFCL  for  the  appropriate  action.  If  the  message  is  a  data  type 
and  the  function  designated  by  the  DEST.FUNC  field  is  active  on  that  particular 
node,  the  Input  Server  transfers  the  message  to  the  AL.  The  AL  must  update  the 
NST’s  unique  section  for  the  indicated  function  with  the  TOT  of  the  last  message 
received  for  that  function  and  also  the  last  data  message  processed  for  that  function. 
If  the  data  message  is  for  a  non-active  function.  Input  Server  sends  it  to  a  non-active 
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function  queue  array.  The  details  of  the  AL  and  the  task  to  manage  the  non-active 
function  queue  are  left  for  another  thesis. 

If  the  message  is  a  control  type,  additional  parsing  of  the  CNTRL.ACTION 
field  is  required.  IF  the  CNTRL.ACTION  field  is  either  a  /non  or  a  fnoff,  the 
Input  Server  transfers  the  message  to  RL  for  further  processing.  When  the  CN- 
TRL-ACTION  field  is  a  marker  (MKR)  or  a  checkpoint  complete  (CHKPT)  message, 
Input  Server  transfers  the  message  to  Checkpoint.  If  the  CNTRL.ACTION  field 
indicates  a  status  (STATUS)  message  the  Input  Server  transfers  the  message  to 
the  Status  Monitor.  The  appropriate  task  receives  the  message  by  accepting  a  ren¬ 
dezvous  call  from  Input  Server.  All  of  the  necessary  action  required  of  the  task  is 
completed  prior  to  the  Input  Server  relinquishing  processor  control.  In  simulating 
a  failed  node,  the  Input  Server  only  allows  status  messages  to  be  passed  to  Status 
Monitor.  The  Input  Server  reads  all  other  messages,  but  does  not  call  the  respec¬ 
tive  tasks.  Status  messages  must  be  passed  to  Status  Monitor  since  node  recovery 
is  triggered  by  the  first  periodic  status  message  received  after  a  node  is  restarted  as 
explained  later. 

C.  OUTPUT  SERVER 

The  Output  Server  is  responsible  for  ordering  all  message  traffic  generated  by 
tasks  within  a  node  and  relay  ig  this  traffic  to  the  NCL.  Ordering  of  a  node’s  mes¬ 
sage  traffic  is  accomplished  utilizing  queue  management  techniques  as  described  in 
the  previous  section.  Since  all  tasks  within  a  node  are  concurrent  processes,  messages 
are  placed  into  the  Output  Server  message  queue  autonomously.  For  this  reason,  the 
queue  management  variables  must  be  accessible  to  any  teisk  which  generates  message 
traffic.  Proper  maintenance  of  this  queue  ensures  the  chronological  ordering  of  mes¬ 
sage  generating  events  occurring  internally  to  a  node.  When  a  tasks  places  a  message 
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into  the  Output  Server  queue  for  transmission,  the  task  sets  the  boolean  variable 
MSG_TO_SEND  to  true.  Another  boolean  variable  BLOCK_WRITE,  is  utilized  to 
prevent  tasks  from  overwriting  a  message  in  the  Output  Server  queue  before  it  can 
be  passed  to  the  NCL.  During  each  periodic  activation,  if  MSG_TO_SEND  is  true, 
the  next  available  message  in  the  Output  Server  queue  is  read  from  the  queue  and 
written  into  the  NCL  queue.  Prior  to  placing  a  message  into  the  NCL  queue.  Output 
Server  appends  a  logical  time  stamp  on  the  message  for  chronological  identification 
purposes.  The  Output  Server  can  only  send  message  traffic  if  a  BLOCK-WRITE 
condition  does  not  exist  within  the  NCL.  The  Output  Server  at  any  given  node  only 
relays  at  most  one  message  during  a  given  activation  period.  This  prevents  a  given 
node’s  Output  Server  from  monopolizing  the  network. 

D.  STATUS  MONITOR 

The  overall  purpose  of  the  Status  Monitor  is  to  provide  fault  tolerant  facilities 
for  the  node,  by  maintaining  the  current  operational  status  of  all  system  nodes  in  its 
NST.  This  is  accomplished  through  the  three  functions  that  Status  Monitor  per¬ 
forms.  The  three  separate  functions  are:  generate  periodic  status  messages  indicating 
the  health  of  the  node,  monitor  and  maintain  a  timer  array  within  the  NST  to  detect 
failure  of  other  nodes,  and  processes  all  status  messages  received  by  the  node.  The 
health  of  the  node  is  determined  by  the  AL,  and  is  a  reflection  of  the  node’s  ability 
to  complete  assigned  functions  prior  to  their  deadline.  A  load  percentage  greater 
than  one  indicates  an  overloaded  node.  Fault  detection  is  achieved  by  monitoring  the 
receipt  of  these  periodic  status  message  from  other  system  nodes.  If  a  periodic  status 
message  is  not  received  within  a  specified  interval,  node  failure  is  assumed  and  the 
appropriate  node  is  reflected  as  down  in  the  NST.  Aperiodic  messages  are  utilized  by 
the  Status  Monitor  only  during  recovery  procedures.  Status  Monitor,  accessible 
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from  the  Input  Server,  consists  of  two  independent  tasks,  Status  Broadcast  (SB) 
and  the  Status  Received  (SR).  The  Status  Broadcast  is  activated  on  a  periodic 
basis,  utilizing  a  simple  delay  statement.  The  activation  of  the  Status  Received  is 
via  a  rendezvous  call  from  the  Input  Server  upon  receipt  of  a  status  message.  The 
primary  means  of  determining  node  status,  is  for  each  node  to  periodically  broadcast 
its  load  percentage  to  all  other  nodes.  In  turn,  each  node  waits  for  these  broadcasts  as 
confirmation  that  other  nodes  are  in  fact  operational.  The  Status  Monitor  at  each 
node  maintains  a  1  by  N  array  ,  each  element  containing  the  Time-of-Receipt  (TOR) 
of  the  last  status  message  received  from  the  appropriate  node.  This  value  is  used  in 
comparisons  with  the  Real-Time-Clock  (RTC),  to  determine  if  nodes  have  failed  to 
transmit  periodical  status  messages.  If  a  given  node’s  Status  Monitor  detects  the 
failure  of  another  node,  then  it  logs  the  failed  node  as  down  in  the  NST,  and  notifys 
the  Node  Failure  routine. 

1.  Status  Message  Receipt 

As  previously  indicated,  two  types  of  status  messages  are  utilized,  periodic 
and  aperiodic,  both  of  which  are  control  type  messages  with  the  CONTROL_ACTION 
field  set  equal  to  status.  All  status  messages  received  by  the  Input  Server  are  passed 
to  the  Status  Monitor  for  appropriate  action. 

Periodic  messages  are  used  to  promulgate  the  fact  that  a  node  is  opera¬ 
tional,  as  well  as  to  indicate  its  current  load  percentage.  These  messages  are  indicated 
by  the  presence  of  a  “1”  in  the  DEST.NODE  field  of  the  message,  with  the  load  per¬ 
centage  contained  in  the  DEST.FUNC  field.  This  loading  information  is  utilized 
by  the  RL  at  each  node  in  determining  the  receiving  node  in  overload  and  recovery 
conditions.  Recovery  and  overload  conditions,  are  covered  in  another  thesis. 

The  aperiodic  messages  are  indicated  by  the  presence  of  a  “2”  in  the 
DEST.NODE  field  of  the  message.  Aperiodic  messages  are  transmitted  in  conjunction 
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with  a  node  recovery  only.  Upon  restart,  the  recovering  node  transmits  an  aperiodic 
message  with  the  load  equal  to  zero,  receipt  of  which  causes  all  active  nodes  to  trans¬ 
mit  an  aperiodic  message  containing  the  common  and  unique  sections  of  their  NST. 

2.  Status  Message  Broadcast 

The  Status  Broadcast  periodically  generates  local  status  broadcast  mes¬ 
sages,  and  checks  the  timeout  conditions  of  other  nodes.  On  each  activation.  Status 
Broadcast  obtains  the  current  value  of  the  RTC  and  compares  that  to  the  TOR  of 
the  last  status  message  received  from  the  applicable  node.  If  this  time  differential 
is  greater  than  a  predetermined  Timeout  interval,  the  associated  node  is  reflected  as 
down  in  the  NST  and  the  Node  Failure  task  is  called. 

E.  CHECKPOINTING  PROCEDURES 

Checkpointing  procedures  are  the  cornerstone  of  a  distributed  system’s  frame¬ 
work.  As  stated  earlier,  the  main  purpose  of  conducting  checkpoints  is  to  establish 
globally  consistent  points  which  serve  as  synchronization  points  during  reconfigura¬ 
tion  procedures.  A  local  state  of  a  node  is  defined  by  its  initial  state  and  the  sequence 
of  events  that  have  occurred  at  that  node  since  the  previous  checkpoint.  An  event 
occurs  for  each  receive  occurrence  of  a  message.  A  checkpoint  is  merely  a  snapshot  of 
a  local  state  of  a  node  at  any  point  in  time.  A  set  of  checkpoints,  one  for  each  node 
in  the  system,  is  called  a  global  checkpoint  and  is  consistent  if  all  snapshots  form  a 
consistent  global  state[Ref.  6]. 

Checkpoint  contains  two  independently  activated  task  bodies.  Check  Pt  and 
Event  Cnt.  Task  Check  Pt  is  activated  by  a  rendezvous  call  from  the  Input  Server 
upon  receipt  of  a  marker  or  checkpoint  complete  message.  Event  Cnt,  activated 
periodically  by  the  use  of  a  delay  statement,  monitors  the  number  of  messages  received 
by  a  given  node  and  generates  a  marker  message  after  receiving  a  pre-determined 
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number  of  messages. 

Checkpointing  is  conducted  independently  at  each  node.  Checkpointing  pro¬ 
cedures  are  initiated  by  the  first  node  to  accumulate  the  pre-determined  number  of 
events.  This  node  broadcasts  a  marker  message  containing  its  unique  section  of  the 
NST.  Upon  receipt  of  this  marker  message  other  nodes  conduct  checkpoint  locally  if 
not  already  accomplished  and  update  their  NST  with  the  unique  section  contained 
ill  the  body  of  the  marker  message.  Additionally,  when  the  first  marker  message  is 
received  at  a  given  node,  the  node  also  transmits  a  marker  message  containing  its 
own  unique  section  of  the  NST.  Requiring  each  node  in  turn  to  transmit  a  marker 
message  ensures  that  all  nodes  have  exact  replicas  of  the  unique  sections  of  the  NST. 
When  the  node  originating  the  checkpoint  has  received  a  marker  message  from  all 
other  active  nodes,  it  transmits  a  checkpoint  complete  message.  The  communication 
protocol,  a  first-in-first-out  network,  ensures  delivery  of  the  checkpoint  complete  mes¬ 
sage  (CHKPT)  to  each  node  occurs  after  all  associated  marker  messages  have  been 
received.  This  ensures  complete  and  identical  NSTs  at  each  node.  Since  there  is  no 
global  synchronization  of  checkpointing  events,  the  possibility  exists  that  a  node  is 
required  to  alter  its  NST  between  the  time  of  local  checkpoint  and  receipt  of  marker 
messages  from  all  other  nodes.  This  is  accomplished  through  the  use  of  a  temporary 
copy  of  a  node’s  unique  section,  made  at  checkpoint  time.  The  marker  messages  are 
retained  in  the  temporary  variable  until  a  checkpoint  complete  message  is  received, 
at  which  time  the  temporary  variable  is  written  into  the  NST  and  the  entire  NST  is 
duplicated  in  the  backup  copy  NSTBAK.  This  method  of  retaining  a  backup  copy  of 
the  NST,  ensures  that  a  globally  consistent  copy  of  the  previous  checkpoint  is  still 
available  in  the  event  that  a  node  failure  occurs  during  checkpoint  procedures. 
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IV.  STATE  DIAGRAM  REPRESENTATION  OF 

TASKS 


A.  GENERAL 

As  previously  mentioned,  all  tasks  within  the  LIFFCL  are  concurrent  processes. 
Input  Server  and  Output  Server  are  periodic  tasks  which  are  activated  through  the 
use  of  a  time  delay.  A  delayed  task  is  suspended  by  the  node’s  operating  system  during 
the  period  of  the  delay.  Tasks  Status  Monitor  and  Checkpoint  are  activated  by  a 
rendezvous  call  from  the  Input  Server  upon  receipt  of  certain  message  types.  This 
chapter  illustrates  the  logical  progression  of  events  occurring  within  the  indicated 
task  as  shown  in  the  state  diagram.  The  actual  implementation  of  the  user  program 
is  covered  in  the  next  chapter. 

B.  INPUT  SERVER  TASK 

Input  Server  periodically  checks  its  queue  for  a  message  received.  If  a  message 
is  to  be  processed,  it  parses  at  most  two  fields  to  determine  the  message  type  as  shown 
in  Figure  4.1.  Depending  on  its  type,  the  message  is  passed  to  the  appropriate  layer  for 
further  processing  in  order  to  complete  the  necessary  action  required  by  the  message. 
If  no  message  is  present,  Input  Server  releases  the  processor. 

C.  OUTPUT  SERVER  TASK 

Output  Server  checks  flags  set  by  tasks  within  the  different  layers  of  the  node  to 
determine  if  a  message  is  available  for  transmission.  The  Output  Server  accomplishes 
this  by  transferring  the  message  from  its  own  queue  to  the  queue  of  NCL.  Output 
Server  ensures  the  NCL  queue  is  not  full  before  writing  the  message  in  this  queue.  A 
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INPUT  SERVER 


DATA  TYPE  CONTROL  TYPE 

MSG  MSG 


Figure  4.1:  Input  Server  State  Diagram 

full  queue  is  indicated  by  the  NCL  variable  BLOCK.WRITE  being  true.  It  also  time 
stamps  the  message  to  ensure  its  ordering.  These  events  are  illustrated  in  Figure  4.2. 


D.  STATUS  MONITOR  TASK 

As  indicated  previously,  Status  Monitor  performs  three  different  functions. 
Two  of  these  functions,  Status  Broadcast  and  Timeout,  generate  periodic  stattis 
messages  for  the  node,  and  monitor  the  receipt  of  these  messages  from  other  nodes. 
Additionally,  Status  Received  is  invoked  by  the  Input  Server  upon  receipt  of  both 
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OUTPUT  SERVER 


Figure  4.2:  Output  Server  State  Diagram 

periodic  and  aperiodic  status  messages.  The  three  functions  and  their  resulting  events 
are  shown  in  Figures  4.3  and  4.4. 

E.  CHECKPOINT  TASK 

Checkpoint  processes  two  types  of  messages  pertaining  to  checkpointing.  A 
marker  message  initiates  checkpointing  if  not  already  in  progress,  and  a  checkpoint 
complete  message  signifies  the  successful  completion  of  a  checkpoint.  Information 
pertaining  to  a  node’s  functions  is  sent  in  the  marker  message  so  all  nodes  can  update 
their  NST’s.  Upon  completion  of  checkpointing,  a  backup  copy  of  NST  is  made.  This 
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backup  copy  is  utilized  during  node  failure,  since  the  failed  node  is  unable  to  pass  the 
statistics  of  its  active  functions.  Two  procedures  axe  utilized  to  process  the  different 
message  types  as  shown  in  Figure  4.5. 
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Figure  4.3:  Status  Monitor  Broadcast  and  Timeout  State  Diagrams 
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Figure  4.4:  Status  Monitor  Message  Received  State  Diagram 
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Figure  4.5:  Checkpoint  State  Diagram 
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V.  A  SIMULATION  USING  ADA 


A.  GENERAL 

The  simulation  of  a  four  node,  twelve  function,  distributed  system  is  imple¬ 
mented  as  a  group  of  independent  Ada  packages.  Each  node  is  comprised  of  the 
Output  and  Input  Servers,  the  Status  Monitor,  Checkpoint,  and  the  RL.  All 
these  components  are  instantiated  for  each  node  and  are  referred  to  as  the  node  re¬ 
lated  components.  The  system  also  contains  community  components  which  include 
a  globally  ordered  communication  network  (NCL),  a  random  event  generator  (EG), 
and  a  front  end  processor  (FEP). 

B.  SYSTEM- WIDE  COMMUNITY  COMPONENTS 

The  community  components  explained  in  this  section,  are  the  system  compo¬ 
nents  not  utilized  in  the  actual  processing  of  data  or  control  type  messages. 

NCL  is  used  to  simulate  the  transmission  of  messages  from  the  nodes’  Output 
Servers  via  a  broadcast  network.  The  Input  Servers  receive  these  messages  from 
the  NCL  utilizing  a  circular  queue.  The  delay  difference  between  the  NCL,  Output 
Server,  and  the  Input  Server  determines  the  number  of  messages  in  the  queue  at 
any  given  time. 

The  random  event  generator  is  activated  periodically  to  simulate  a  real-time 
event.  It  simulates  node  overload  and  node  failure.  This  simulation  verifies  the 
sequence  of  events  occurring  within  the  LIFFCL  as  a  result  of  node  failure/ repair 
and  overload  conditions.  The  reconfiguration  events  normally  occurring  as  a  result 
of  this  simulation  occur  primarily  in  the  RL  layer  and  are  covered  in  another  thesis 
(Ref.  8]. 
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C.  NODE  RELATED  COMPONENTS 


The  node  related  components  are  algorithms  and  tasks  utilized  for  processing 
the  different  types  of  messages  received  by  a  node.  These  components  are  used  to 
implement  each  node  and  are  are  explained  in  this  section. 

The  Input  Server  contains  two  independent  task  bodies,  Build  Node  and 
Receive  Message.  The  Build  Node  task  is  utilized  by  the  Front  End  Processor 
only  during  the  initialization  of  nodes  as  described  previously.  The  other  task  re¬ 
ceives  messages  from  the  NCL  via  a  circular  queue.  The  messages  received  are  parsed 
to  determine  the  necessary  action  to  be  taken.  Input  Server  establishes  a  rendezvous 
with  either  the  Checkpoint,  Status  Monitor,  or  the  RL  bcised  on  the  contents  of 
the  MSG-KIND  field  of  a  message. 

The  Output  Server  consists  of  a  single  task  activated  periodically  by  the  ex¬ 
piration  of  a  delay  statement.  It  sends  any  available  messages  to  NCL  during  its 
activation  period. 

Checkpoint  handles  the  process  of  checkpointing  and  ensures  that  a  consistent 
global  state  is  maintained.  Any  node  can  originate  the  checkpoint  process  by  con¬ 
ducting  a  local  checkpoint  and  sending  a  marker  message  containing  its  unique  data. 
The  node  originating  the  checkpoint  must  keep  track  of  marker  messages  received 
from  other  nodes  and  indicate  when  the  checkpoint  is  complete.  Upon  receipt  of  the 
marker  messages,  all  the  nodes  must  store  the  information  passed.  This  process  is 
continued  until  a  checkpoint  complete  message,  sent  by  the  originator  is  received  by 
all  nodes. 

As  indicated  in  Chapter  III,  the  Status  Monitor  consists  of  three  independent 
tasks.  Status  Broadcast,  Timeout,  and  Status  Received.  Status  Broadcast  and 
Timeout  are  activated  periodically  by  the  expiration  of  a  delay  statement,  and  Status 
Received  establishes  a  rendezvous  with  the  Input  Server.  Status  Broadcast  is 
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responsible  for  building  and  sending  the  periodic  message  to  the  Output  Server. 
Timeout  detects  the  failure  of  a  node  to  respond  with  a  periodic  message  within 
a  specified  time  interval.  Status  Received  processes  both  periodic  and  aperiodic 
messages.  For  periodic  messages,  a  node  only  updates  the  NST.  Aperiodic  messages 
signal  a  node  recovery;  therefore,  a  node  must  respond  by  sending  the  unique  and 
common  section  of  its  NST. 

D.  VERIFICATION  OF  STATE  DIAGRAMS 

To  illustrate  the  correctness  of  the  state  diagrams  shown  in  the  previous  chap¬ 
ters,  timing  diagrams  are  provided.  They  reflect  the  sequence  of  events  occurring  at 
a  node  during  simulation  following  the  receipt  of  messages  built  and  sent  by  either 
the  Event  Generator  or  the  implemented  tasks  of  the  LIFFCL. 

Maintaining  the  global  state  of  the  system  is  accomplished  by  utilizing  check¬ 
pointing  procedures.  Checkpoint  is  initiated  by  the  first  node  to  record  a  predeter¬ 
mined  number  of  events.  This  node  is  designated  as  the  checkpoint  originator.  As 
shown  in  Figure  5.1,  node  1  originates  the  checkpoint.  The  arcs  represent  the  message 
transmission  time  between  nodes.  Nodes  2,  3  and  4  respond  to  the  mar^:er  message  by 
conducting  a  local  checkpoint  and  transmitting  a  marker  message.  Also  it  is  worth 
noting  that  only  one  node  is  active  at  any  given  time.  When  node  1  has  received 
a  marker  message  from  all  nodes,  it  sends  a  checkpoint  complete  message  signifying 
a  globally  consistent  checkpoint  has  been  attained.  Upon  receipt  of  this  checkpoint 
complete  message,  each  node  stores  the  checkpoint  data  into  NSTBAK. 

In  order  for  the  health  of  the  nodes  to  be  monitored,  periodic  status  messages 
are  sent  by  each  node.  Each  node  records  the  load  of  the  node  which  sent  the  periodic 
message.  A  timer  is  used  to  determine  if  a  node  responded  on  time  with  this  message. 
A  diagram  listing  the  periodic  events  that  occur  at  each  node  in  response  to  the  receipt 
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N1 


N2 


N3 


N4 


MKR  sent  from  N1 

Time  rcvd  from  N1 

MKR  sent  from  N3 

Time  rcvd  from  N3 

MKR  sent  from  N4 

Time  rcvd  from  N4 

MKR  sent  from  N2 

Time  rcvd  from  N2 

N1  has  a  checkpoint 

Checkpoint  MSG 
sent  from  N1 

Time  rcvd  from  N1 


590.70 

(send  MKR  & 
update  NST) 


593.43 
(iqxlate  NST) 

591.03 


596.16 


597.53 

(update  NST) 


603.00 

(save  NSTBAK) 


Figure  5.1:  Checkpointing  Events 
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of  these  periodic  messages  is  illustrated  in  Figure  5.2. 


E.  SUMMARY 

The  actual  code  implemented  in  this  simulation  model  is  contained  in  Appendix 
A.  The  simulation  output  is  contained  in  Appendix  B.  Comments  have  been  inserted 
in  the  areas  where  an  algorithm  or  procedure  needs  to  be  placed.  Areas  requiring 
further  development  are  covered  in  the  next  chapter. 
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Periodic  Status 
msgsentfioinNI 


Time  rcvd  from  N1 J 


432.64  >  43166  ^  432.65  *■  43166 

(update  own  timer)  (update  N1  ’s  timer)  (update  N1  ’s  timer)  (update  Nl’s  timer) 


Periodic  Status 
msg  sent  from  N2 


Time  rcvd  from  N2  435.37  435.39  435379  43538 

(update  N2*s  timer)  (update  own  timer)  (update  N2's  timer)  (update  N2*s  timer) 


Periodic  Status 
msg  sent  from  N3 

Time  rcvd  from  N3 


438.11  438.13  438.12  438.12 

(update  N3’s  timer)  (update  N3*s  timer)  (update  own  timer)  (update  N3*s  timer) 


Periodic  Status 

msg  sent  from  N4  - - 

Time  rcvd  from  N4  440.85  440.87  440.86  440.86 

(update  N4's  timer)  (update  N4’s  timer)  (update  N4’s  timer)  (update  own  timer) 


Figure  5.2:  Periodic  Message  Processing 
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VI.  CONCLUSIONS  AND  FUTURE  WORK 


A.  GENERAL 

In  this  thesis,  a  scheme  for  building  robust,  fault  tolerant,  distributed  systems 
is  presented.  The  proposed  fault  detection  methodology,  combined  with  the  indepen¬ 
dent  checkpointing  and  recovery  techniques,  is  an  effective  means  of  obtaining  fault 
tolerance.  The  checkpointing  procedures  enable  a  globally  consistent  system  state 
to  be  stored  at  every  node,  allowing  for  robust  reconfiguration  efforts  as  a  result  of 
transient  failures.  Additionally,  the  duplication  of  all  application  code  at  each  node 
reduces  the  communications  normally  associated  with  rollback/recovery  and  func¬ 
tion  migration.  Also,  requiring  nodes  to  store  all  data  messages  received  prevents 
retransmission  of  requisite  message  traffic  during  function  migration. 

B.  CONCLUSION 

The  fault  tolerance  implementation  described  is  a  simple  yet  effective  means 
for  detecting  node  failure.  However,  in  some  critical  real-time  systems,  the  lag  time 
between  failure  and  its  detection  may  need  to  be  reduced.  A  reduction  can  be  ob¬ 
tained  by  simply  increasing  the  frequency  with  which  the  timeout  array  contents  are 
examined.  The  trade-off  is  a  reduction  in  the  time  slice  that  a  node  can  dedicate  to 
application  processing. 

The  proposed  asynchronous  checkpointing  scheme  appears  to  provide  better 
throughput  and  response  time  by  eliminating  the  synchronization  overhead  normally 
required  in  creating  globally  consistent  checkpoints.  The  domino  effect,  normally 
associated  with  asynchronous  checkpoint  is  alleviated  by  maintaining  a  backup  copy  of 
the  previous  globally  consistent  checkpoint  data.  Should  node  failure  occur  during  the 
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process  of  checkpointing,  the  recovered  functions  must  only  rollback  to  the  previous 
checkpoint. 

The  availability  of  large  quantities  of  RAM  storage  makes  the  storage  of  all 
messages  received  an  alternative.  Rollback/recovery  time  increases  dramatically  if 
nodes  are  required  to  retransmit  all  requisite  traffic  for  a  recovering  node.  The  lin¬ 
ear  processing  time  required  for  message  queue  manipulation  during  checkpointing 
is  negligible  compared  to  the  overhead  required  for  retransmission.  Furthermore, 
achievement  of  a  globally  consistent  state  upon  recovery  requires  all  messages  to  be 
logged  at  either  the  transmitting  or  receiving  node.  It  is  believed  to  be  advantageous 
to  maintain  the  queue  as  a  receive  queue. 

C.  FUTURE  WORK 

In  order  to  fully  realize  the  capabilities  of  the  proposed  scheme,  a  more  intensive 
analysis  on  a  multi-processor  implementation  is  required.  A  complete  multi-layered 
system  as  depicted  in  Figure  1.2  must  be  utilized  to  analyze  the  periodicity  relation¬ 
ship  between  the  NCL,  LIFFCL,  RL  and  AL.  A  multi-processor  environment  would 
also  yield  a  more  realistic  indication  of  the  relationship  between  the  frequency  of 
checkpointing  and  failure  recovery  time.  To  enable  truly  independent  functionality 
among  the  software  layers  of  the  node,  circular  queues  should  be  implemented  in 
each  task.  This  prevents  the  Input  Server  from  tying  up  the  processor  until  a  task 
completes  the  action  required  by  a  message.  Also  the  development  of  the  Timeout 
routine  as  a  separate  task  would  reduce  the  frequency  with  which  Status  Broadcast 
is  currently  being  activated  but  still  maintain  a  short  detection  time. 

Additionally,  queue  management  for  data  messages  must  be  implemented  in 
order  to  support  the  future  development  of  the  AL  software.  The  AL  software  must 
also  provide  an  interface  to  the  RL  and  LIFFCL  layers. 
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APPENDIX  A:  SIMULATION  CODE 


/♦  This  program  code  is  part  of  a  joint  project.  Members  of  ♦/ 

/«  the  project  teeim  are  as  follows:  S.  Shukla,  C.  Yamg,  *! 

/*  R.  Puett,  amd  K.  Lehman  ♦/ 

/*  The  code  is  given  in  its  entirety  for  completeness  of  *! 

I*  of  the  topics  covered  in  this  thesis  ♦/ 

/*  The  code  is  in  no  particular  order  except  for  the  first  few  */ 
/*  sections  which  are  the  base  for  the  remaining  sections.  *! 

/*  Each  section  has  comments  preceding  it  and  before  each  sub-  *! 
/♦  section  or  task/procedure  within  the  section  to  define  what  */ 
/♦  is  occurring  within  that  section.  */ 

I*  The  first  section  contains  the  DECLARATIONS  which  are  *! 

!*  used  throughout  the  prograim.  For  each  of  the  remaining  */ 

/*  sections,  a  specification  package  precedes  the  package  body.  */ 
/*  The  package  PROCESS  is  the  second  section  because  it  needs  */ 

/*  to  be  compiled  before  the  packages  following  it.  It  is  the  */ 
/*  package  that  contains  the  algorithms.  The  next  section  is  */ 

/*  TRAND.  It  is  the  random  number  generator  and  needs  to  be 
/*  compiled  prior  to  compiling  COMMNET  which  follows  TRAND.  ♦/ 

/*  COMMNET  creates  the  instantiations  to  form  the  nodes.  The  */ 

/*  ordering  of  what  follows  from  this  point  on  does  not  matter.  */ 
/♦  The  remaining  sections  are  listed  in  the  following  order:  ♦/ 

/♦  INS  -  contains  the  NOOE.INITIALIZER  and  INPUT.SERVER  tasks  */ 

/*  OUTS  -  contains  the  OUTPUT.SERVER  task  */ 

/*  CKPT  -  contains  the  CHECK.PT  and  EVENT.CNT  tasks  */ 

/*  RL  -  contains  the  RECONF.LAYER  task  ♦/ 

/*  SM  -  contains  the  STATUS.REC  and  STATUS.BDCST  tasks  */ 

/♦  FP  -  contains  the  EVENT.MAKER  i.e..  Event  Generator  */ 

/*  FEP  -  Front-End  Processor  which  opens  output  files  for  each  ♦/ 
/*  node  and  initiates  the  NST  for  each  node.  ♦/ 


with  text.io;  use  text.io; 
with  calendar;  use  calendar; 
package  DECLARATIONS  is 

F1,F2,F3,F4  :  FILE.TYPE; 

type  MSG.TYPE  is  (data, control) ; 

type  ACTION.TYPE  is  (MKR, FN0N,FN0FF, STATUS, CHKPT) ; 

type  IMCM  is  array (1 .. 12, 1 .. 12)of  integer;  — IPC  comms  array 

type  FI  is  array(l . .4)of  integer;  — function  information  params. 

type  FL  is  array (1 .. 12) of  integer;  — function  location  array 

type  NSL  is  arrayCl . .2,1. .4)of  integer; — Node  status  amd  load 

type  RCY  is  array(l . .4)of  integer;  — array  used  when  recovering 

type  STAT.TIME  is  arrayCl . .4)of  float;  — array  used  in  each  node  to 

type  FAIL_FLG  is  arrayCl .. 12) of  boolean;  — array  used  in  each  node  to 

record  the  times  when  status 
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msgs  were  sent  by  other  nodes 
type  FUNCTION.REC  is  —  contents  of  the  unique  section 


record 

TTC 

:  float; 

TTD 

:  float; 

FN.INFO 

:  FI; 

LAST.MSG.PROC 

:  float; 

LAST.MSG.REC 

:  float; 

REGISTER.VAL 

:  integer  0; 

SYMBOL.VAR 

:  integer  :»  0; 

end  record; 

type  FUNCTION.STATS 

is  array(1..12)  of  FUNCTION.REC; 

type  UNIQUE  is  array (1.. 4)  of  FUNCTION.STATS; 

type  COMMON  is 

record 

NODE.STAT.LD 

NSL;  —  node  status  and  load 

FN.LOC 

FL; 

IMC 

IMCM; 

end  record; 

type  BODY.TYPE  is 

record 

DATA  :  stringCl . .80) ; 

UNiq  :  FUNCTION.STATS ; 

COMM  ;  COMMON; 
end  record; 

type  MSG.RECORD  IS  — msg  to  be  passed  on  the  net 

record 

TOT  :  float;  — Time  of  Transmit  of  a  msg 

TOR  :  float;  — Time  of  Receipt  of  a  msg 

MSG.KIND  :  MSG.TYPE;  —type  of  msg 

DEST.FUNC  :  integer  :=  0;  — which  fn  a  msg  is  sent  to 

DEST.NODE  :  integer  :=  0;  — node  who  acts  on  a  msg 

0RIG_FN_N0DE  :  integer  :=  0;  — originator  (fn  or  Node)  of  msg 

CNTRL.ACTION  :  ACTION.TYPE 

MSG.BODY  :  BODY.TYPE;  — msg  that  needs  to  be  read 

end  record; 

q.SIZE  ;  constant  integer  :=  15;  — size  of  message  queues 

type  QUEUE  is  array  (1.. Q.SIZE)  of  MSG.RECORD; 

type  MSG.QUEUE  is  — queue  to  hold  msgs  to  send  out 

record 

MSG.TO.SEND  :  booleein  false; — indicates  if  queue  has  a  msg 

BLOCK.WRITE  ;  boolean  :*  false; — used  to  block  writing  to  queue 
RD.CNT  :  integer  :■  1;  — the  read  pointer  in  queue 

MSG.CNT  :  integer  :*  1;  — the  write  pointer  in  queue 

MSG.QUE  :  QUEUE;  — holds  up  to  15  msgs 

end  record; 

type  NODE. STATUS .TABLE  is  — defines  contents  of  the  NST 

record 

COMMON.SECTION  :  COMMON; 

UNIQUE.SECTION  :  UNIQUE; 

NODE.ID  :  integer  :«  0; 


36 


end  record; 

type  VARIABLES  is  — status  conditions  for  a  node 

— (local  to  each  node) 

record 

RCVRY_IN_PROG:  boolean  :»  false; — indicates  node  recovery 
RCVRY  :  RCY;  — array  used  in  rcvry  process 

UNIQ.SENT  :  booleam  :=  false; — indicates  if  a  unique  section 

—  was  sent  by  a  node 

CHKPT.TAKEN  :  RCY;  — array  used  to  indicate  if  a 

—  checkpoint  is  complete  or  not 
CHKPT.ORIG  :  boolean  :■  false; —  node  originating  chkpt 
CHKPT.COMPLETE  :  boolean  :*  false; — a  completed  checkpoint  done 
LOCAL.CHKPT  :  boolean  :=  false; -'indicates  if  a  node  has  taken 

—  a  checkpoint 

CHKPT.TIMER  :  float ; 

FIRST_MKR  :  boolezm  :*  false; — flag  to  note  1st  marker  msg  to 

—  come  across  net  -  indicates  a 

—  checkpoint  needs  to  occur 

EVNT.CNT  :  integer  :=  0;  — cnts  up  to  25  then  resets  to  1 

— (indicates  when  a  chkpt  needs 

—  to  be  taiken) 

EVNT_CNT_0UT  :  integer  :*  0;  —  events  sent  by  output  server 
ACTIVE.FN.QUE  :  QUEUE;  —  msgs  for  assigned  functions 

DATA_MSG_QUE  :  QUEUE;  —  holds  msg  for  all  functions 

OUTQ  :  MSG.QUEUE;  — queue  to  hold  output  msgs 

INQ  :  MSG.QUEUE;  — queue  to  hold  input  msgs 

TIMER  :  STAT.TIME;  — array  to  hold  times  of  when 

—  status  msgs  were  sent 

end  record; 

NST.NSTBAK  :  array(l . .4)of  NODE.STATUS.TABLE; 

LOC.VAR  :  array(l . .4)of  VARIABLES; — gives  each  node  a  set  of  Loc  Vars 
ST  :  array(l . .4)of  NODE.STATUS.TABLE; — temporeury  copy  of  NST 

NET.BUSY:  boolean;  — indicates  if  network  is  tied  up 

NET_Q  :  MSG.QUEUE;  — queue  to  hold  msgs  for  network 

FAILED.NODE  :  FAIL.FLG;  — used  to  indicated  failed  node 

end  DECLARATIONS; 


with  DECLARATIONS;  use  DECLARATIONS; 
with  TEXT.IO;  use  TEXT.IO; 
package  PROCESS  is 

— this  procedure  gets  and  prints  the  current  value  of  real  time 
procedure  GET_REAL_TIME(NID:  in  integer;  LT:  in  out  float); 

— this  procedure  processes  a  marker  msg 

procedure  MKR.MSG  (M:in  out  MSG_RECORD;NID:in  integer ;FLG: in  out 

boolean) ; 

— this  procedure  processes  a  function  on  msg 

procedure  FN_0N_MSG  (M  :  in  MSG_REC0RD;  NID  :  in  integer); 
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— this  procedure  processes  a  function  off  msg 

procedure  FN_OFF_MSG(M;in  out  MSG_RECORD;NID:in  integer;MSG_FLAG: 

in  out  boolean); 

— this  procedure  processes  a  status  msg 

procedure  STAT.MSG  (N:in  out  MSG_RECORO;NID:in  integer ;FLG: in  out 

boolean) ; 

— this  procedure  processes  a  checkpoint  complete  msg; 
procedure  CHK.PT_CMPLT.MSG  (M  :  in  MSG.RECORD;  HID  :  in  integer); 

end  PROCESS; 


with  text.io; 

package  FLOAT.INOUT  is  new  TEXT.IO. FLOAT.IO (FLOAT) ; 

with  FLOAT.INOUT;  use  FLOAT.INOUT; 

with  text.io;  use  text.io; 

with  number.io;  use  number.io; 

with  integer.io;  use  integer.io; 

with  calendar;  use  calendar; 

with  DECLARATIONS;  use  DECLARATIONS; 

—  The  package  PROCESS  contains  all  the  procedures  necessary 

—  to  process  the  different  types  of  messages  that  come  into 

—  the  Input  Server.  Each  procedure  is  preceeded  by  a 

—  description  of  its  actions. 

package  body  PROCESS  is 

—  Procedure  Get  Real  Time  utilizes  the  system  package 

—  calendar  to  access  the  Real  time  clock  of  the  system 

—  processor.  In  this  case,  only  the  seconds  portion  of 

—  the  calendar  is  utilized. 

procedure  GET_REAL_TIME(NID;  in  integer;LT:  in  out  float)  is 
S  :  DAY.DURATION; 

R  :  TIME; 

T  :  float; 
begin 

R  :■  clock; 

S  :«  SECONDS (R); 

T  :»  float(S); 

LT  T; 
case  NID  is 
when  1  ■> 

PUT(F1,T.6,5,0); 

SET.C0L(F1,15); 

PUT(F1,"  Node  #1"); 
when  2  »> 
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PUT(F2,T,6,5,0); 

SET.C0L(F2,15); 

PUT(F2,"  Mode  #2"); 
when  3  *> 

PUT(F3,T,6,5,0); 

SET_C0L(F3,15); 

PUT(F3,"  Node  #3"); 
when  4  ■> 

PUT(F4,T,6,5,0); 

SET.C0L(F4,15); 

PUT(F4."  Node  #4"); 
when  others  => 

NULL; 
end  case; 

end  GET.REAL.TIME; 

—  Procedure  Function  On  Message  is  called  from  the 

—  Reconfiguration  task.  It  processes  a  FNON  message 

—  and  updates  a  Node's  NST  to  reflect  the  indicated 

—  function's  location. 

procedure  FN_0N_MSG(M  :in  MSG_RECORD;  NID  :  in  integer)  is 
Z,Y,X  :  integer; 

GM  :  MSG.RECORD; 

PT  :  float  :*  0.0; 

DEACT.NODE  :  integer ; 

begin 

GM  :«  M; 

Z  :«  NST(NID) .NODE.ID; 

Y  M.DEST.FUNC; 

DEACT.NODE  :»  NST(Z) .COMMON.SECTION.FN.LOC(Y) ; 

NST(Z) .C0MM0N_SECTI0N.FN_L0C(Y)  :*  M.ORIG.FN.NODE; 
case  Z  is  —  write  info  to  specific  output  file 
when  1  *> 

GET_REAL.TIME(Z,PT) ; 

SET_C0L(F1,25); 

PUT(F1,"R_L  rcvd  FN_0N  from  Node  #"); 

PUT (FI .M.ORIG.FN.NODE, 1) ; 

SET.C0L(F1,60); 

PUT(F1,"EVNT  #"); 

PUT (FI , M . MSG.BODY . UNIQ ( 1 ) . SYMBOL  VAR , 4) ; 

SET.C0L(F1,72); 

if  M.ORIG.FN.NODE  =  Z  then  —  activating  node  -  turns  fn  on 
PUT.LINE(F1,"I  am  the  activating  node  and  changing  NST."); 
else 

if  DEACT.NODE  »  Z  then--deactivating  node 
PUT.LINE(F1,"I  am  the  deactivating  node  and  chzmging  NST"); 
else 

PUT.LINE(F1 ."Neither  act/deact  node  and  changing  NST."); 
end  if; 
end  if; 
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SET.C0L(F1,72) ;  —  shows  changes  in  NST  from  FNON 

for  R  in  1 . . 12  loop 

PUT(F1,NST(Z) .COMMON_SECTION.FN_LDC(R) ,3) ; 
end  loop; 

MEW_LINE(F1) ; 
when  2  ■> 

GET.REAL_TIME(Z,PT) ; 

SET_C0L(F2,25); 

PUT(F2,'’R_L  rcvd  FN.ON  from  Node  #") : 

PUT (F2 , M . ORIG.FN.NQDE , 1 ) ; 

SET_C0L(F2,60); 

PUT(F2,"EVNT  #"); 

PUT(F2,M.MSG.B0DY.UNIQ(1) .SYMB0L.VAR,4) ; 

SET_C0L(F2,72); 

if  H . ORIG.FN.NODE  ■  Z  then  — activating  node,  turns  fn  on 
PUT_LINE(F2,''I  am  the  activating  node  and  changing  NST."); 
else 

if  DEACT.NODE  ■  Z  then—deactivating  node 
PUT_LINE(F2,"I  am  the  deactivating  node  and  changing  NST"); 
else 

PUT.LINE(F2, "Neither  act/deact  node  zmd  chzoiging  NST."); 
end  if; 
end  if; 

SET_C0L(F2,72) ;  —  shows  changes  in  NST  from  FNON 

for  R  in  1 . . 12  loop 

PUT(F2,NST(Z) .COMMON_SECTIQN.FN.LOC(R) ,3) ; 
end  loop; 

NEW.LINE(F2) ; 
when  3  ■> 

GET.REAL.TIME(Z,PT) ; 

SET.C0L(F3,25); 

PUT(F3,"R.L  rcvd  FN.ON  from  Node  #"); 

PUT(F3,M.0RIG_FN.N0DE,1) ; 

SET.C0L(F3,60); 

PUT(F3,"EVNT  #") ; 

PUT(F3,M.MSG.B0DY.UNiq(l) .SYMBOL. VAR, 4) ; 

SET.C0L(F3,72); 

if  M . ORIG.FN.NODE  «  Z  then  —  activating  node  -  turns  fn  on 
PUT.LINE(F3,"I  am  the  activating  node  and  cheuxging  NST."); 
else 

if  DEACT.NODE  *  Z  then- -deactivating  node 
PUT.LINE(F3,"I  am  the  deactivating  node  and  changing  NST"); 
else 

PUT.LINE(F3, "Neither  act/deact  node  and  changing  NST."); 
end  if; 
end  if; 

SET.C0L(F3,72) ;  —  shows  changes  in  NST  from  FNON 

for  R  in  1 . . 12  loop 

PUT(F3,NST(Z) .COMMON.SECTION.FN.LOC(R) ,3) ; 
end  loop; 

NEW.LINE(F3) ; 
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when  4  ■> 

GET_REAL.TIME(Z,PT) ; 

SET_C0L(F4,25); 

PUT(F4,"R_L  rcvd  FN.ON  from  Node  #"); 

PUT(F4,M.0RIG.FN.N0DE,1) ; 

SET.C0L(F4,60); 

PUT(F4,"EVNT  i") ; 

PUT(F4,M.MSG.B0DY.UNIQ(1) .SYMBOL. VAR, 4) ; 

SET.C0L(F4,72); 

if  M . ORIG.FN.NODE  ■  Z  then  — activating  node  -  turns  fn  on 
PUT_LINE(F4,"I  am  the  activating  node  and  changing  NST."); 
else 

if  DEACT.NDDE  >  Z  then — deactivating  node 
PUT.LINE(F4,"I  am  the  deactivating  node  and  changing  NST"); 
else 

PUT_LINE(F4, "Neither  act/deact  node  and  changing  NST."); 
end  if; 
end  if; 

SET_C0L(F4,72) ;  —  shows  changes  in  NST  from  FNON 

for  R  in  1 . . 12  loop 

P»JT(F4,NST(Z)  .C0MM0N_SECTI0N.FN.L0C(R)  ,3)  ; 
end  loop; 

NEW.LINE(F4) ; 
when  others  »> 

NULL; 
end  case; 
end  FN.ON.MSG; 

—  Procedure  Function  Off  Message  is  called  by  the  Reconfiguration 

—  task.  It  processes  a  FNOFF  message  and  determines  if  the  node  is 

—  to  activate  a  function.  It  also  generates  a  FNON  message  if 

—  necessary. 

procedure  FN_OFF_MSG(M: in  out  MSG_RECORD;NID:  in  integer ;MSG_FLAG: 

in  out  boolean)  is 

Z.Y  :  integer; 

J  :  MSG.RECORD; 

PT  :  float  :«  0.0; 
begin 

Z  :«  NST(NID) .NODE.ID; 

Y  M.DEST.NODE; 

GET_REAL.TIME(Z,PT) ; 
case  Z  is 
when  1  ■> 

SET_C0L(F1,25); 

PUT(F1,"R_L  rcvd  FN.OFF  from  Node  #"); 

PUT(F1 ,M.0RIG.FN.N0DE, 1) ; 

SET.C0L(F1,60); 

PUT(F1,"EVNT  #"); 

PUT (F 1 , M . MSG.BODY . UNIQ ( 1 ) . SYMBOL. VAR . 4) ; 

SET.C0L(F1,72) ; 
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if  Z  •  Y  then 

PUT(F1,"FN_0N  sent  to  activate  FN  •") : 
PUT(F1,M.DEST.FUNC,2) ;NEW_LINE(F1) ; 
else 

PUT_LINE(Fl,"No  further  action  required  ATT."); 
end  if; 
when  2  ■> 

SET.C0L(F2,25); 

PUT(F2."R.L  rcvd  FN.OFF  from  Hode  •") ; 

PUT (F2 , M . 0RIG_FN_N0DE , 1) ; 

SET_C0L(F2,60); 

PUT(F2,"EVNT  #") ; 

PUT(F2,M.MSG.B0DY.UNIQ(1) .SYMBOL. VAR, 4) ; 

SET.C0L(F2,72); 

if  Z  ■  Y  then 

PUT(F2,"FN_0N  sent  to  activate  FN  •"); 
PUT(F2,M.DEST.FUNC,2) ;NEW.LINE(F2) ; 
else 

PUT_LINE(F2,"No  further  action  required  ATT."); 
end  if; 
when  3  »> 

SET_C0L(F3,25); 

PUT(F3,"R.L  rcvd  FN.OFF  from  Node  #"); 

PUT (F3 , M . ORIG.FN.NODE , 1 ) ; 

SET.C0L(F3,60); 

PUT(F3,"EVNT  #") ; 

PUT(F3,M.MSG.B0DY.UNIQ(1) .SYMBOL. VAR, 4) ; 

SET.C0L(F3,72); 

if  Z  “  Y  then 

PUT(F3,"FN.0N  sent  to  activate  FN  #") ; 
PUT(F3,M.DEST.FUNC,2) ;NEW.LINE(F3) ; 
else 

PUT_LINE(F3,"No  further  action  required  ATT."); 
end  if; 
when  4  «> 

SET.C0L(F4,25); 

PUT(F4,"R_L  rcvd  FN.OFF  from  Node  #") ; 

PUT (F4 , M . ORIG.FN.NODE , 1 ) ; 

SET.C0L(F4,60); 

PUT(F4,"EVNT  #") ; 

PUT(F4,M.MSG.B0DY.UNIQ(1) .SYMBOL. VAR, 4) ; 

SET.C0L(F4,72); 

if  Z  *  Y  then 

PUT(F4,"FN.0N  sent  to  activate  FN  #"); 
PUT(F4,M.DEST.FUNC,2) ;NEW.LINE(F4) ; 
else 

PUT.LINE(F4,"No  further  action  required  ATT."); 
end  if; 

when  others  »> 

NULL; 
end  case; 
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—  activating  node 


if  Z  *  Y  then 

—  create  FNON  msg  to  send 
J.MSG.KIND  :»  CONTROL; 

J.DEST.FUNC  :»  M.DEST.FUNC; 

J.ORIG.FN.NODE  :«  Z; 

J.CNTRL.ACTION  :■  FNON; 

—  set  flag  to  indicate  msg  needs  to  go  to  OUTPUT.SERVER 
MSG.FLAG  :■  true; 

M  :»  J; 
end  if; 

end  FN.OFF.MSG; 

—  Procedure  Status  Message  processes  both  periodic  eoxd  aperiodic 

—  status  messages.  It  is  called  by  Status  Monitor  (SM) .  The 

—  recovery  process  is  handled  by  this  procedure.  Recovery  is 

—  accomplished  by  rebuilding  the  NST  of  the  recovering  node 

—  from  the  contents  of  aperiodic  messages  (i.e.  the  Unique 

—  Section) 

procedure  STAT_MSG(M  :  in  out  MSG_RECORD;  NID  :  in  integer;  FLG  : 

in  out  booleem)  is 

X,Z,Y  :  integer; 

GM  :  MSG.RECORD; 

RCVRY.COMPLETE  :  boolean  :=  false; 

MY.UNIQ.SENT  :  boolean  :=  false; 

PT  :  float  :»  0.0; 

begin  — Dest.Node  field  is  used  to  designate  a  periodic  msg  (1) 

--  or  an  aperiodic  msg  (2).  The  Dest.Fn  field  holds  the  value 

—  of  the  load  of  a  node  designated  by  the  ORIG.FN.NODE. 

Z  NST(NID) .NODE.ID; 

Y  M.DEST.FUNC; 

X  :■  M.0RIG.FN.N0DE; 

L0C_VAR(Z) .TIMER (X)  :*  M.TOR;  — update  periodic  time  of  node 

NST(Z) .C0MM0N_SECTI0N.N0DE_STAT.LD(2,X)  :»  M.DEST.FUNC; 

—  node  load  percentage. 

GET.REAL.TIME(0,PT) ; 
if  LOC.VAR(Z) .RCVRY.IN.PROG  and 

PT  -  LOC.VARCZ). TIMER (Z)  >  61.5  then 
LOC.VAR(Z) .RCVRY.IN.PROG  :*  false; 
NST(Z).C0MM0N.SECTI0N.N0DE.STAT.LD(1,Z)  :»  0; 

NST(Z) .C0MM0N.SECTI0N.N0DE.STAT.LD(2,Z)  :*  0; 
for  J  in  1..4  loop  —  clear  rcvry  array 

LOC.VAR(Z) .RCVRY(J)  :«  0; 
end  loop; 
case  Z  is 
when  1  *> 

GET.REAL_TIME(1,PT); 

SET.C0L(F1,72); 

PUT.LINE(F1 , "RCVRY  attempts  unsuccessful.  Restart  RCVRY"); 
when  2  => 

GET.REAL.TIME(2,PT) ; 
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SET.C0L(F2,72); 

PUT_LINE(F2,"RCVRY  attempts  unsuccessful.  Restart  RCVRY"); 
when  3  *> 

GET.REAL.TIME(3,PT) ; 

SET.C0L(F3,72); 

PUT_LINE(F3, "RCVRY  attempts  unsuccessful.  Restart  RCVRY"); 
when  4  ■> 

GET.REAL.TIME(4,PT) ; 

SET_C0L(F4.72); 

PUT_LINE(F4, "RCVRY  attempts  unsuccessful.  Restart  RCVRY"); 
when  others  »> 

NULL; 
end  case; 
end  if; 

if  M.DEST.NODE  ■  1  then  — periodic  msg 

if  NST(Z) .COMMON_SECriON.NQDE_STAT_LD(l,X)  «  0  and 

M.DEST.FUNC  =  0  then 

LOC.VAR(Z) .UNIQ.SENT  :=  false; 

NST(Z) .C0MM0N.SECTI0N.NQDE_STAT_LD(1,X)  :=  1; 

FAILED.NODE(X)  :«  false; 
end  if; 

if  not  LOC.VAR(Z) . RCVRY. IN.PROG  and 

NST(Z) .C0MM0N.SECTI0N.N0DE.STAT.LD(1,Z)  =  0  then 
PUT.LINEC "BUILDING  an  APERIODIC  message."); 

GM.DEST.NODE  :■  2;  —  build  aperiodic  status  message 

GM.DEST.FUNC  :»  0; 

GM.ORIG.FN.NODE  :»  Z; 

GM.CNTRL.ACTION  :«  STATUS; 

GM.MSG.KIND  :*  control; 

FLG  :*  true; 

LOC.VAR(Z) .RCVRY.IN.PROG  :«  true; 

for  I  in  1..4  loop  --  reset  timers  of  nodes  other  theoi  the 
if  I  /»  X  then  —  node  whose  periodic  msg  was  received 
LOC.VAR(Z) .TIMER(I)  :=  PT; 
end  if; 
end  loop; 
end  if; 

else  —  aperiodic  msg 

if  NST(Z).C0MM0N.SECTI0N.N0DE.STAT_LD(1,Z)  -  0  then 

— recovery  node 
LOC.VAR(Z) .RCVRY (X)  :»  1; 
if  Z  /»  X  then 

NST(Z) .UNiqUE.SECTIQN(X)  M.MSG.BODY.UNIQ; 

NST(Z) .COMMON.SECTION  ;»  M . MSG.BODY . COMM ; 
end  if; 

RCVRY.COMPLETE  :»  true; 

for  I  in  1..4  loop  —  check  if  all  nodes  sent  the 

—  unique  sections 

if  NST(Z)  .COMMON.SECTION. NODE.STAT.LDd, I)  =  1  then 

—  active  node 
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if  LOG. VAR (Z) .RCVRY(I)  «  0  then 


RCVRY.COMPLETE 
end  if; 
end  if; 
end  loop; 

if  RCVRY.COMPLETE  then 


:*  false; 


GM.DEST.NODE 

GM.DEST.FUNC 


false; 

—  clear  rcvry  array 


—  call  the  node  recovery 

—  procedure 

1;  —  build  periodic  status  message 

0;  —  indicates  rcvry  complete  to 

—  other  nodes 

GM.0RIG_FN.NQDE  :=  Z; 

GM.CNTRL.ACTION  :=  STATUS; 

GM.MSG.KIND  :=  control; 

FLG  :*  true; 

LOG. VAR (Z) .RCVRY.IN.PROG 
for  J  in  1 . .4  loop 

LOC.VAR(Z) .RCVRY(J)  :»  0; 
end  loop; 
end  if; 

else  —  not  the  orig  node  of  APERIODIC 

—  chk  if  unique  section  was  sent 

if  not  L0C_VAR(Z) .UNiq.SENT  then 

GM.DEST.NODE  :=  2;  —  build  an  aperiodic  status  message 

GM.DEST.FUNC  :*  NST(Z) .C0MM0N_SECTI0N.N0DE.STAT.LD(2,NID) ; 
GM.ORIG.FN.NODE  :=  Z; 

GM.MSG_BODY.UNiq  :*  NST(Z) .UNiqUE_SECTION(Z) ; 
GM.MSG.BODY.COMM  :=  NST(Z) .COMMON.SECTION; 

GM.CNTRL.ACTION  :=  STATUS; 

GM.MSG.KIND  :=  control; 

FLG  ;»  true; 

MY.UNiq.SENT  :=  true; 

LOG. VAR(Z) .UNiq.SENT  :=  true; 
end  if;  --  UNiq.SENT 

end  if; 
end  if; 

GET_REAL_TIME(Z,PT) ; 
case  Z  is 
when  1  »> 

SET_C0L(F1,25); 
if  M.DEST.NODE  =  1  then 

PUT(F1,"S_M  rcvd  PERIODIC  from  Node  #") ; 
else 

PUT(F1,"S.M  rcvd  APERIODIC  from  Node  #"); 
end  if; 

PUT (FI , M . ORIG.FN.NODE , 1 ) ; 

SET.C0L(F1,60); 

PUT(F1,"EVNT  #"); 

PUT ( F 1 , M . MSG. BODY . UNI q ( 1 ) . SYMBOL.VAR , 4) ; 

SET_C0L(F1,72); 
if  M.DEST.NODE  =  1  then 
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PUT(F1, "Reset  Timer  element  of  Node  #") 

PUT(F1 ,M . ORIG.FN.NODE, 1) ; 

NEW_LINE(F1) ; 
else 

if  NST(Z) .C0MMQN.SECTI0N.N0DE_STAT.LD(1,Z)  *  0  then 
if  RCVRY.COMPLETE  then 

PUT_LINE(F1, "Recovery  complete, send  PERIODIC  msg"); 
else 

PUT_LINE(Fl,"This  is  the  recovering  node."); 
end  if; 
qXso 

if  L0C_VAR(Z) .UNIQ.SENT  and  MY.UNIQ.SENT  then 
PUT_LINE(F1, "Sending  APERIODIC  with  uniq  sect."); 
else 

PUT.LINE (FI, "APERIODIC  response  sent,  no  action."); 
end  if; 
end  if; 
end  if; 
when  2  *> 

SET_C0L(F2,25); 
if  M.DEST.NODE  *  1  then 

PUT(F2,"S_M  rcvd  PERIODIC  from  Node  #") ; 
else 

PUT(F2,"S_M  rcvd  APERIODIC  from  Node  #") ; 
end  if; 

PUT (F2 , M . ORIG.FN.NODE , 1) ; 

SET. COL (F2, 60); 

PUT(F2,"EVNT  #") ; 

PUT(F2,M.MSG.B0DY.UNiq(l) .SYMB0L.VAR,4) ; 

SET.C0L(F2,72); 
if  M.DEST.NODE  =  1  then 

PUT(F2, "Reset  Timer  element  of  Node  #") ; 

PUT (F2 , M . ORIG.FN.NODE , 1 ) ; 

NEW.LINE(F2) ; 
else 

if  NST(Z) .C0MM0N.SECTI0N.N0DE_STAT.LD(1,Z)  =  0  then 
if  RCVRY.COMPLETE  then 

PUT.LINE (F2, "Recovery  complete, send  PERIODIC  msg"); 
else 

PUT.LINE (F2, "This  is  the  recovering  node."); 
end  if; 
else 

if  LOC.VAR(Z) .UNIQ.SENT  and  MY.UNIQ.SENT  then 
PUT.LINE(F2, "Sending  APERIODIC  with  uniq  sect."); 
else 

PUT.LINE (F2, "APERIODIC  response  sent,  no  action."); 
end  if; 
end  if; 
end  if; 
when  3  *> 

SET.C0L(F3,25); 
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if  M.DEST.NODE  *  1  then 

PUT(F3,"S.M  rcvd  PERIODIC  from  Node  #") ; 
else 

PUT(F3,"S_M  rcvd  APERIODIC  from  Node  #"); 
end  if; 

PUT(F3,M.0RIG.FN.N0DE,1) ; 

SET_C0L(F3,60); 

PUT(F3,"EVNT  #"); 

PUT(F3,M.MSG_B0DY.UNIQ(1) .SYMB0L.VAR,4) ; 

SET.C0L(F3,72); 
if  M.DEST.NODE  =  1  then 

PUT(F3, "Reset  Timer  element  of  Node  #"); 
PUT(F3,M.0RIG_FN_N0DE,1) ; 

NEW_LINE(F3) ; 
else 

if  NST(Z) .C0MM0N_SECTI0N.N0DE.STAT.LD(1,Z)  =  0  then 
if  RCVRY.COMPLETE  then 

PUT_LINE(F3, "Recovery  complete, send  PERIODIC  msg"); 
else 

PUT_LINE(F3,"This  is  the  recovering  node."); 
end  if; 
else 

if  LOC.VAR(Z) .UNIQ.SENT  and  MY.UNIQ.SENT  then 
PUT.LINE(F3, "Sending  APERIODIC  with  Tiniq  sect."); 
else 

PUT_LINE(F3, "APERIODIC  response  sent,  no  action."); 
end  if; 
end  if; 
end  if; 
when  4  *> 

SET.C0L(F4,25); 
if  M.DEST.NODE  =  1  then 

PUT(F4,"S.M  rcvd  PERIODIC  from  Node  #"); 
else 

PUT(F4,"S.M  rcvd  APERIODIC  from  Node  #") ; 
end  if; 

PUT(F4,M.0RIG.FN.N0DE,1) ; 

SET.C0L(F4,60); 

PUT(F4,"EVNT  #") ; 

PUT(F4,M.MSG.B0DY.UNiq(l) .SYMB0L.VAR,4) ; 

SET.C0L(F4,72); 
if  M.DEST.NODE  =  1  then 

PUT(F4 , "Reset  Timer  element  of  Node  #") ; 
PUT(F4,M.0RIG.FN.N0DE,1) ; 

NEW.LINE(F4) ; 

oX  S6 

if  NST(Z) .C0MM0N.SECTI0N.N0DE.STAT.LD(1,Z)  »  0  then 
if  RCVRY.COMPLETE  then 

PUT.LINE(F4, "Recovery  complete, send  PERIODIC  msg"); 
else 

PUT.LINE(F4,"This  is  the  recovering  node."); 
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end  if; 
else 

if  LOC.VARCZ) .UNIQ.SENT  and  MY.UNIQ.SENT  then 
PUT_LINE(F4, "Sending  APERIODIC  with  nniq  sect."); 
else 

PUT_LINE(F4, "APERIODIC  response  sent,  no  action."); 
end  if; 
end  if; 
end  if; 

when  others  => 

NULL; 
end  case; 

MY.UNIQ.SENT  :*  false; 
if  FLG  then 
M  :»  GM; 
end  if; 
end  STAT.MSG; 

—  Procedure  Marker  Message  processes  a  MKR  message  utilized  for 

—  the  checkpointing  process.  It  is  called  from  the  CHECK.PT 

—  task.  The  node’s  NST  is  updated  with  the  contents  of  the 

—  message  body.  The  procedure  also  generates  a  checkpoint 

—  complete  message  at  the  node  originating  checkpoint  to 

—  indicate  a  successful  checkpoint. 

procedure  MKR.MSG(M  :  in  out  MSG.RECORD;  NID  :  in  integer;  FLG  : 

in  out  boolean)  is 

X,Z,Y  :  integer; 

GM  :  MSG.RECORD; 

PT  :  float  :*  0.0; 
begin 

Z  :=  NST(NID) .N0DE.ID; 

Y  :«  M.ORIG.FN.NODE; 
if  not  LOC.VARCZ) .FIRST.MKR  then 
LOC.VAR(Z).FIRST.MKR  :=  true; 
if  Y  =  Z  then 

LOC.VARCZ) .CHKPT.ORIG  :=  true; 

LOC.VARCZ) .CHKPT.TAKENCZ)  1; 

GET.REAL.TIMECO.PT) ; 

LOC.VARCNID).CHKPT.TIMER  :=  PT; 
else 

LOC.VARCZ) .CHKPT.ORIG  false; 
end  if; 
end  if; 

if  Y  /=  Z  then  —  not  originating  node  of  msg 

NSTCZ) .UNiqUE.SECTIONCY)  :«  M.MSG.BODY.UNIQ; 

if  LOC.VARCZ) .CHKPT.ORIG  *  true  then  —  check  point  originator 
LOC.VARCZ) .CHKPT.TAKENCY)  :»  1; 

LOC.VARCZ) .CHKPT.COMPLETE  :«  true; 
for  I  in  1 . .4  loop 

if  NSTCZ) .COMMON.SECTION.NODE.STAT.LDC 1,1)  *  1  then 
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—  node  active 

if  LOC_VAR(Z).CHKPT_TAKEN(I)  -  0  then 
L0C_VAR(Z) .CHKPT.COMPLETE  :»  false; 
end  if; 
end  if; 
end  loop; 

if  LOC.VAR(Z). CHKPT.COMPLETE  »  true  then 
GM.MSG.KIND  CONTROL; 

GM.CNTRL.ACTION  :=  CHKPT; 

GM.ORIG.FN.NODE  :=  Z; 

FLG  :»  true; 
end  if; 

else  —  not  originating  node 

if  not  L0C_VAR(Z)  .LOCAL.CHKPT  then  —  didn’t  send  unique  sect 
ST(Z)  :«  NST(Z); 

GM.MSG.KIND  ;=  CONTROL; 

GM.CNTRL.ACTION  :=  MKR; 

GM.ORIG.FN.NODE  :=  Z; 

GM.MSG.BODY.UNIQ  :*  NST(Z) .UNIQUE_SECTION(Z) ; 

FLG  :*  true; 

LOC.VAR(Z) .LOCAL.CHKPT  :=  true;  — true  if  checkpointed 
end  if; 
end  if; 
end  if; 

GET.REAL.TIME(Z,PT) ; 
case  Z  is 
when  1  ®> 

SET.C0L(F1,25); 

PUT(F1,"C.P  rcvd  MKR  from  Node  #"); 

PUT (FI .M.ORIG.FN.NODE, 1) ; 

SET.COL(F1,60); 

PUT(F1,"EVNT  #"); 

PUT (F 1 , M . MSG.BODY . UNI Q ( 1 ) . SYMBOL. VAR , 4) ; 

SET.C0L(F1,72); 
if  LOC.VAR(Z) .CHKPT.ORIG  then 
if  LOC. VAR (Z). CHKPT.COMPLETE  then 

PUT.LINE(Fl,"MKRs  rcvd  from  all  nodes, Send  CHKPT.COMP") ; 
else 

PUT.LINE(F1,"I  originated  CHKPT.  Not  all  MKRs  yet  rcvd"); 
end  if; 
else 

if  not  LOC. VAR(Z). LOCAL.CHKPT  then 
PUT.LINE(F1 ."Local  CHKPT  conducted.  Send  uniq  in  MKR."); 
else 

PUT.LINE (FI, "Local  CHKPT  already  conducted.  Store  UNIQ"); 
end  if; 
end  if; 
when  2  *> 

SET.C0L(F2,25); 

PUT(F2,"C_P  rcvd  MKR  from  Node  #"); 

PUT (F2, M.ORIG.FN.NODE,!) ; 
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SET.C0L(F2,60); 

PUT(F2,"EVNT  i") : 

PUT(F2,M.MSG_B0DY.UNiq(l) .SYMBOL. VAR, 4) ; 

SET_C0L(F2,72); 
if  LOC_VAR(Z).CHKPT_ORIG  then 
if  LOC_VAR(Z).CHKPT_COMPLETE  then 

PUT.LINE(F2,"MKRs  rcvd  from  all  nodes, Send  CHKPT.COMP") ; 

PUT_LINE(F2,"I  originated  CHKPT.  Not  all  MKRs  yet  rcvd"); 
end  if; 

if  not  LOC.VAR(Z).LOCAL.CHKPT  then 
PUT_LINE(F2, "Local  CHKPT  conducted.  Send  uniq  in  MKR."); 
else 

PUT.LINE(F2, "Local  CHKPT  already  conducted.  Store  UNIQ"); 
end  if; 
end  if; 
when  3  *> 

SET_C0L(F3,25); 

PUT(F3,"C.P  rcvd  MKR  from  Node  #"); 

PUT (F3 , M . ORIG.FN.NODE , 1 ) ; 

SET.C0L(F3,60); 

PUT(F3,"EVNT  #"); 

PUT (F3,M.MSG.B0DY. UNIQ (1) .SYMBOL. VAR, 4) ; 

SET  C0L(F3,72); 
if  LOC.VAR(Z).CHKPT.ORIG  then 
if  LOC.VAR(Z).CHKPT.COMPLETE  then 

PUT.LINE(F3,"MKRs  rcvd  from  all  nodes, Send  CHKPT.COMP"); 
else 

PUT.LINE(F3,"I  originated  CHKPT.  Not  all  MKRs  yet  rcvd"); 
end  if; 
else 

if  not  LOC.VAR(Z) .LOCAL.CHKPT  then 
PUT.LINE(F3, "Local  CHKPT  conducted.  Send  uniq  in  MKR."); 
else 

PUT.LINE(F3, "Local  CHKPT  already  conducted.  Store  UNIQ"); 
end  if; 
end  if; 
when  4  ■> 

SET.C0L(F4,25); 

PUT(F4,"C.P  rcvd  MKR  from  Node  #"); 

PUT (F4 , M . ORIG.FN.NODE , 1 ) ; 

SET.C0L(F4,60); 

PUT(F4,"EVNT  #"); 

PUT(F4,M.MSG.B0DY.UNiq(l) .SYMBOL  VAR, 4) ; 

SET_C0L(F4,72); 
if  LOC.VAR(Z).CHKPT.ORIG  then 
if  LOC.VAR(Z) .CHKPT.COMPLETE  then 
PUT.LINE(F4,"MKRs  rcvd  from  all  nodes, Send  CHKPT.COMP"); 
else 

PUT.LINE(F4,"I  originated  CHKPT.  Not  all  MKRs  yet  rcvd"); 
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end  if; 
else 

if  not  L0C_VAR(Z) .LOCAL.CHKPT  then 
PUT_LINE(F4, "Local  CHKPT  conducted.  Send  uniq  in  MKR."); 
else 

PUT_LINE(F4, "Local  CHKPT  already  conducted.  Store  UNIQ"); 
end  if; 
end  if; 

when  others  «> 

NULL; 
end  case; 
if  FLG  then 
M  GM; 
end  if; 
end  MKR.MSG; 

—  Procedure  Checkpoint  Complete  Message  processes  a  CHKPT  message 

—  that  was  built  in  the  Status  Message  section.  It  resets  all 

—  flags  set  during  the  checkpointing  process,  and  it  copies 

—  checkpoint  data  into  the  backup  NST  (NSTBAK) . 

procedure  CHK.PT.CMPLT.MSG  (M  :  in  MSG.RECORD;  NID  :  in  integer)  is 
Z.Y  :  integer  ;=  M.ORIG.FN.NODE; 

PT  :  float  :■  0.0; 
begin 

NSTBAK (NID)  :=  ST (NID); 

Z  NST(NID).N0DE.ID; 

LOC.VAR(NID).FIFST.MKR  :«  FALSE; 

L0C_VAR(NID).CHRPT.0RIG  :=  FALSE; 

GET.REAL.TIME(Z,PT); 

LOC.VAR(NID).CHKPT.TIMER  :*  PT; 

GET_REAL.TIME(Z,PT) ; 
case  Z  is 
when  1  *> 

SET.C0L(F1,25); 

PUT(F1,"C.P  rcvd  CHKPT  from  Node  #"); 
PUT(F1,M.0RIG.FN_N0DE,1) ; 

SET.C0L(F1,60) ; 

PUT(F1,"EVNT  #"); 

PUT (F 1 , M , MSG.BODY . UNIQ ( 1 ) . SYMBOL. VAR , 4) ; 

SET.C0L(F1,72); 
if  Z  ■  Y  then 

PUT_LINE(F1, "CHKPT  orig.  Global  CHKPT  complete  store  NST"); 
else 

PUT_LINE(F1, "Global  CHKPT  complete  store  NST"); 
end  if; 
when  2  •> 

SET.C0L(F2,25); 

PUT(F2,"C_P  rcvd  CHKPT  from  Node  #") ; 
PUT(F2,M.0RIG_FN_N0DE,1); 

SET_C0L(F2,60); 


51 


PUT(F2,"EVNT  •"); 

PUT(F2,M.MSG.B0DY.UNIQ(1) .SYMBOL. VAR, 4) ; 

SET.C0L(F2,72); 
if  Z  ■  Y  then 

PUT_LINE(F2,"CHKPT  orig.  Global  CHKPT  complete  store  NST"); 
qIso 

PUT.LINE(F2, "Global  CHKPT  complete  store  NST"); 
end  if; 
when  3  •> 

SET.CaL(F3,25); 

PUT(F3,"C.P  rcvd  CHKPT  from  Node  #"); 

PUT (F3 . M . ORIG.FN.NODE , 1 ) ; 

SET_C0L(F3,60); 

PUT(F3,"EVNT  #") ; 

PUT(F3.M.MSG.B0DY.UNiq(l) .SYMBOL. VAR, 4) ; 

SET.C0L(F3.72); 
if  Z  ■  Y  then 

PUT.LINE(F3, "CHKPT  orig.  Global  CHKPT  complete  store  NST"); 
else 

PUT.LINE(F3, "Global  CHKPT  complete  store  NST"); 
end  if; 
when  4  «> 

SET.C0L(F4,25); 

PUT(F4,"C.P  rcvd  CHKPT  from  Node  #"); 

PUT(F4 ,M . ORIG.FN.NODE , 1) ; 

SET.C0L(F4,60); 

PUT(F4,"EVNT  #"); 

PUT(F4,M.MSG.B0DY.UNiq(l) .SYMBOL. VAR, 4) ; 

SET.C0L(F4,72); 
if  Z  *  Y  then 

PUT.LINE(F4, "CHKPT  orig.  Global  CHKPT  complete  store  NST"); 
else 

PUT.LINE(F4, "Global  CHKPT  complete  store  NST"); 
end  if; 

when  others  •> 

NULL; 
end  case; 

if  NST(NID) .NODE.ID  »  Y  then  --  CHKPT  orig  clears  MKR  array 
for  I  in  1 . .4  loop 

LOC_VAR(NID).CHKPT.TAKEN(I)  0; 
end  loop; 
end  if; 

end  CHK.PT_CMPLT.MSG; 
end  PROCESS; 


with  FLOAT.INOUT;  use  FLOAT.INOUT; 
with  MATH;  use  MATH; 
with  RANDOM;  use  RANDOM; 
with  PROCESS;  use  PROCESS; 
with  TEXT.IO,  integer. io; 
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use  TEXT.IO,  integer.io; 
package  TRAND  is 


—  Procedure  Test  Random  is  a  random  integer  generator 

—  which  normalizes  the  remdom  variable  to  the  desired 

—  range  as  indicated  by  the  parameter. 

procedure  TEST.RANDOM  (VAR  :  in  out  integer); 
end  TRAND; 

package  body  TRAND  is 

procedure  TEST.RANDOM  (VAR  :  in  out  integer)  is 
X  :  float; 
begin 

delay  2.0; 

X  :■  RANDOM. NEXT.NUMBER; 
if  VAR  «  4  then 

VAR  :«  integer (X  ♦  4.0); 

while  VAR  ■  0  loop  —  X4  must  be  an  integer  in  the 

—  interval  1-4  (#  of  node) 

delay  1.0; 

X  RANDOM. NEXT.NUMBER;  —  calls  the  function 
VAR  :*  integer(X  ♦  4.0); 
end  loop; 
else 

if  VAR  ■  12  then 

VAR  :«  integer (X  *  12.0); 

while  VAR  ■  0  loop  —  VAR  must  be  an  integer  in  the 

—  interval  1-12  (#  of  function) 

delay  1.0; 

X  RANDOM. NEXT.NUMBER;  —  calls  the  function 
VAR  :■  integer (X  ♦  12.0); 
end  loop; 
else 

—  get  a  delay  parameter 
VAR  :*  integer (-(1.0/0. 5)  ♦  NAT.L0G(1.0  -  X)); 
while  VAR  =  0  loop  —  the  delay  must  be  an  integer 

—  greater  them  0. 

delay  1.0; 

X  RANDOM. NEXT.NUMBER;  —  calls  the  function 
VAR  :*  integer (X  *  4.0); 
end  loop; 
end  if; 
end  if; 

end  TEST.RANDOM; 
end  TRAND; 


with  DECLARATIONS;  use  DECLARATIONS; 
package  COMMNET  is 
task  NETWORK  is 
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entry  SEND_NSG(N  :  in  MSG.RECORD;  NID  :  in  integer); 
end; 

end  COMNNET; 


—  The  following  package  statements  create  instantiations  of  the 

—  indicated  package  utilized  in  the  formation  of  a  node. 


with  OUTS; 

package  OUTSl  is  new  OUTS 
with  OUTS; 

package  0UTS2  is  new  OUTS 
with  OUTS; 

package  0UTS3  is  new  OUTS 
with  OUTS; 

package  0UTS4  is  new  OUTS 
with  INS; 

package  INSl  is  new  INS; 
with  INS; 

package  INS2  is  new  INS; 
with  INS; 

package  INS3  is  new  INS; 
with  INS; 


package  INS4  is  new  INS; 
with  SN; 

package  SMI  is  new  SM; 
with  SN; 

package  SM2  is  new  SM; 
with  SM; 

package  SN3  is  new  SM; 
with  SN; 

package  SN4  is  new  SN; 
with  CKPT; 

package  CKPTl  is  new  CKPT; 
with  CKPT; 

package  CKPT2  is  new  CKPT; 
with  CKPT; 

package  CKPT3  is  new  CKPT; 
with  CKPT; 

package  CKPT4  is  new  CKPT; 
with  RL; 

package  RLl  is  new  RL; 
with  RL; 

package  RL2  is  new  RL; 
with  RL; 

package  RL3  is  new  RL; 
with  RL; 

package  RL4  is  new  RL; 

with  tezt.io;  use  text.io; 

with  integer.io;  use  integer.io; 

with  number.io;use  number.io; 

with  DECLARATIONS;  use  DECLARATIONS; 
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with  PROCESS;  use  PROCESS; 
with  TRAND;  use  TRAND; 
with  INSl;  use  INSl; 
with  INS2;  use  INS2; 
with  INS3;  use  INS3; 
with  INS4;  use  INS4; 

package  body  COMMNET  is 

—  The  NETWORK  task  manages  a  circular  queue, receiving  messziges 

—  from  the  Output  Server  task  and  relaying  them  to  all  the 

—  Input  Server  tasks.  It  serves  as  the  communication  interface 

—  between  nodes. 

task  body  NETWORK  is 
W,R  :  integer; 

MGEN  :  NSG.RECORD; 

MSG.PRESENT  :  boolean  :=  false; 

DT  :  DURATION  2.57; 
begin 
loop 

select 

accept  SEND.MSG  (M:  in  MSG.RECORD;NID:  in  integer)  do 
NULL; 
end; 
or 

delay  DT; 

MSG.PRESENT  false; 

W  :«  NET.q.MSG.CNT; 

R  NET.Q.RD.CNT; 
if  NET_q.MSG.TO.SEND  then 
if  R  >  W  then 

MGEN  NET.q.MSG.qUE(R); 

R  ;»  R  +  1; 
if  R  >  q.SIZE  then 
if  W  <  2  then 

NET. q.MSG.TO. SEND  :=  false; 
NET.q.BLOCK.WRITE  :»  false; 
end  if ; 

NET.q.RD.CNT  :«  1; 
else 

NET.q.RD.CNT  :»  R; 
end  if; 
else 

if  R  <  W  then 

MGEN  :=  NET.q.MSG.qUE(R); 

R  :■  R  +  1; 
if  W  ■  R  then 

NET.q.BLOCK.WRITE  :»  false; 
NET.q.MSG.TO.SEND  :»  false; 
end  if ; 
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NET.Q.RD.CNT  :»  R; 
end  if; 
end  if; 

MSG.PRESENT  :■  true; 
end  if; 

if  MSG.PRESENT  then 
for  Z  in  1 . .4  loop 

W  :»  LOC.VAR(Z).INq.MSG.CNT; 

R  :«  LOC.VAR(Z) .INQ.RD.CNT; 
if  not  LOC.VAR(Z).INq.BLOCK_HRITE  then 
if  W  >■  R  then 

LGC_VAR(Z).INq.MSG_qUE(W)  :*  MGEN; 
LQC_VAR(Z).INq.MSG.TO_SEND  true; 

W  :»  W  +  1; 
if  W  >  q.SIZE  then 
if  R  <  2  then 

LOC.VAR(Z).INq.BLOCK.WRITE  :»  true; 
end  if; 

LOC.VAR(Z) .INq.MSG.CNT  :=  1; 
else 

LOC.VAR(Z) .INq.MSG.CNT  :=  W; 
end  if; 
else 

if  tf  <  R  then 

LGC.VAR(Z).INq.MSG.qUE(W)  :»  MGEN; 
LGC.VAR(Z).INq.MSG.TO.SEND  :«  true; 

W  :»  H  +  1; 
if  W  ■  R  then 

LGC.VAR(Z) .INq.BLGCK.VmiTE  :«  true; 
end  if; 

LGC.VAR(Z) .INq.MSG.CNT  :=  W; 
end  if; 
end  if; 
end  if; 

end  loop;  —  end  for  loop 
end  if; 
end  select; 
end  loop; 
end  NETWGRK; 
end  CGMMNET; 


with  DECLARATIGNS ;  use  DECLARATIGNS ; 

generic 

package  INS  is 

task  NGDE.INITIALIZER  is 

entry  BUILD.NGDE(NID:  in  integer); 
end; 

task  INPUT.SERVER  is 

entry  RECEIVE.MSG(M  :  in  MSG.RECGRD;  NID  :  in  integer); 
end; 
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end  INS; 


with  tezt.io;  use  tezt.io; 

with  integer.io;  use  integer.io; 

vith  number.io;use  number.io; 

with  PROCESS;  use  PROCESS; 

vith  DECLARATIONS;  use  DECLARATIONS; 

with  COMMNET;  use  COMMNET; 

vith  TRAND;  use  TRAND; 

with  RLl;  use  RLl; 

with  RL2;  use  RL2; 

vith  RL3;  use  RL3; 

vith  RL4;  use  RL4; 

with  SNl;  use  SMI; 

vith  SN2;  use  SM2; 

vith  SM3;  use  SMS; 

vith  SM4;  use  SM4; 

with  CKPTl;  use  CKPTl; 

with  CKPT2;  use  CKPT2; 

with  CKPT3;  use  CKPT3; 

with  CKPT4;  use  CKPT4; 

package  body  INS  is 

—  The  NODE. INITIALIZER  task  is  utilized  to  initialize  the  node's  NST, 

—  to  be  utilized  in  the  simulation  process. 

task  body  NODE.INITIALIZER  is 
z,z  :  integer; 
begin 
loop 

select 

accept  BUILD_NODE(NID:  in  integer)  do 
z  :»  1; 
z  NID; 

—  this  loop  builds  the  function  location  array  -  this 

—  would  normally  be  initialized  by  the  task  allocation 

—  which  is  only  done  in  psuedo  code  at  this  time 
for  J  in  1 . . 12  loop 

NST(z) .C0MM0N_SECTI0N.FN.L0C(J)  :«  z; 
z  :■  z  +  1; 
if  z  «  5  then 
z  1; 
end  if; 
end  loop; 

NST(z) .NODE.ID  :»  NID; 

"  this  loop  initializes  all  nodes  to  the  "up"  status 

—  within  each  of  the  NST’s 
for  J  in  1 . .4  loop 

NST(z) .C0MM0N_SECTI0N.N0DE_STAT.LD(1,J)  :*  1; 

NST(z) .C0MM0N_SECTiaN.N0DE.STAT.LD(2,J)  :=  J; 
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end  loop; 

NSTBAKCz)  :»  NST(z) ;  —  madce  backup  copy  of  NST's 

end; 
or 

terminate; 
end  select; 
end  loop; 
end; 

—  The  INPUT.SERVER  task  accepts  messages  from  the  NETWORK  task. 

—  It  parses  the  message  fields  and  calls  the  appropriate  task 

—  to  process  the  message. 

task  body  INPUT.SERVER  is 
Z,W,R,i  :  integer; 

MGEN  :  MSG.RECORD; 

PT  :  float  :■  0.0; 

MSG.PRESENT  :  boolean  :*  false; 

DT  :  DURATION  :»  1.35; 
begin 
loop 

select 

—  msg  being  accepted  from  the  network 
accept  RECEIVE.MSG  (M:  in  MSG.RECORD ; NID :  in  integer)  do 
Z  :«  NST(NID) .NODE.ID; 
end; 
or 

delay  DT; 

MSG.PRESENT  :»  false; 

W  :«  LOC.VAR(Z).INQ.MSG.CNT; 

R  :»  LOC.VAR(Z).INq.RD.CNT; 
if  LOC.VAR(Z) .INQ.MSG.TO.SEND  then 
if  R  >  W  then 

MGEN  LOC.VAR(Z) .INq.MSG.qUE(R); 

R  :«  R  +  1; 
if  R  >  q.SIZE  then 
if  W  <  2  then 

LOG  VAR(Z) .INq.MSG.TO.SEND  :*  false; 

LOC.VAR(Z) .INq.BLOCK.WRITE  :*  false; 
end  if; 

LOC.VAR(Z) .INq.RD.CNT  :»  1; 

0XsG 

LOC.VAR(Z) .INq.RD.CNT  :=  R; 
end  if; 
else 

if  R  <  W  then 

MGEN  LOC.VAR(Z).INq.MSG.qUE(R); 

R  :»  R  +  1; 
if  W  »  R  then 

LOG. VAR (Z) .INq.BLOCK.WRITE  :*  false; 

LOC.VARCZ) .INq.MSG.TO.SEND  :«  false; 
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end  if; 

LOC.VAR(Z).INq.RD.CNT  :=  R; 
end  if; 
end  if; 

MSG.PRESENT  :=  true; 
end  if; 

if  MSG.PRESENT  then 

LOC.VAR(Z) .EVNT.CNT  :=  LOC.VAR(Z) .EVNT.CNT  +  1; 
GET.REAL.TIME(O.PT); 

MGEN.TOR  :=  PT; 

case  Z  is  —  call  specific  section  of  ovn  node 

when  1  «> 

case  MGEN.CNTRL. ACTION  is 
when  MKR  !  CHKPT  »> 

if  NST(Z).C0MM0N.SECTI0N.N0DE_STAT.LD(1,1)  =  1  then 
CKPTl .CHECK_PT,MARKER.MSG(MGEN, 1) ; 
end  if; 

when  FNON  !  FNOFF  => 

if  NST(Z)  .CQMMON.SECTION.NODE.STAT.LDd.l)  =  1  then 
RL 1 . RECQNF.LAYER . IS.MSG. IN (MGEN , 1 ) ; 
end  if; 

when  STATUS  => 

SMI . STATUS.REC . STAT.MSG.REC (MGEN , 1) ; 
when  others  ■> 

NULL; 
end  case; 
when  2  »> 

case  MGEN.CNTRL. ACTION  is 
when  MKR  !  CHKPT  -> 

if  NST(Z) .C0MM0N_SECTI0N.N0DE.STAT.LD(1,2)  *  1  then 
CKPT2 . CHECK.pt . MARKER.MSG (MGEN ,2); 
end  if; 

when  FNON  !  FNOFF  => 

if  NST(Z).COMMON.SECTION.NODE  STAT  LD(1,2)  =  1  then 
RL2.REC0NF.LAYER.IS.MSG.IN(MGEN,2) ; 
end  if; 

when  STATUS  »> 

SM2 . STATUS.REC . STAT.MSG.REC (MGEN , 2) ; 
when  others  •> 

NULL; 
end  case; 
when  3  ■> 

case  MGEN.CNTRL.ACTION  is 
when  MKR  •  CHKPT  => 

if  NST(Z) .C0MM0N.SECTI0N.N0DE.STAT.LD(1,3)  =  1  then 
CKPT3 . CHECK.pt . MARKER.MSG (MGEN , 3) ; 
end  if; 

when  FNON  !  FNOFF  *> 

if  NST(Z) .COMMON.SECTION.NODE.STAT.LD(1.3)  =  1  then 
RL3 . RECONF.LAYER. IS.MSG.IN(MGEN ,3) ; 
end  if ; 
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when  STATUS  *> 

SM3 . STATUS.REC . STAT.MSG.REC (MGEN , 3) ; 
when  others  »> 

NULL; 
end  case; 
when  4  => 

case  MGEN. CNTRL. ACTION  is 
when  MKR  !  CHKPT  -> 

if  NST(Z).C0MM0N_SECTI0N.N0DE.STAT_LD(1,4)  »  1  then 
CKPT4 . CHECK.PT . MARKER.MSG (MGEN , 4) ; 
end  if; 

when  FNON  !  FNOFF  «> 

if  NST(Z)  .C0MM0N.SECm0N.N0DE_STAT_LD(l,4)  =  1  then 
RL4 . RECONF.LAYER. IS.MSG.IN(MGEN ,4) ; 
end  if; 

when  STATUS  »> 

SM4 . STATUS.REC . STAT.MSG.REC (MGEN ,4) ; 
when  others 
NULL; 
end  case; 
when  others  => 

NULL; 
end  case; 
end  if; 
end  select; 
end  loop; 
end; 

end  INS; 


with  DECLARATIONS;  use  DECLARATIONS; 
generic 

package  OUTS  is 
task  OUTPUT.SERVER  is 

entry  START. OUTPUT (M  :  in  MSG.RECORD;  NID  :  in  integer); 
end; 

end  OUTS; 


with  text.io;  use  text.io; 
with  integer.io;  use  integer.io; 
with  number. io;use  number.io; 
with  PROCESS;  use  PROCESS; 
with  TRAND;  use  TRAND; 
with  DECLARATIONS;  use  DECLARATIONS; 
with  COMMNET;  use  COMMNET; 
package  body  OUTS  is 

—  The  OUTPUT.SERVER  task  relays  messages  from  the  various  tasks 

—  within  the  node,  to  the  communication  layer  (NETWORK  task). 

—  The  task  serializes  a  node’s  messages  and  ensures  that  the 
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—  NETWORK  can  accept  it. 

task  body  OUTPUT.SERVER  is 
Z,W,R  :  integer; 

MGEN  :  MSG.RECORD; 

PT  :  float  :®  0.0; 

MSG.PRESENT  :  boolean  :=  false; 

DT  :  DURATION  :=  3.83; 
begin 
loop 

accept  START_OUTPUT(M:  in  MSG_RECORD;NID:  in  integer)  do 
Z  :=  NST(NID) .NODE.ID; 
end; 
or 

delay  DT; 

MSG.PRESENT  :=  false; 

W  :=  LOC_VAR(Z).OUTq.MSG.CNT; 

R  :=  LOC_VAR<Z).OUTq.RD_CNT; 
if  L0C_VAR(Z) .OUTq.MSG.TO.SEND  then 
if  R  >  W  then 

MGEN  :=  LOC_VAR(Z).OUTq.MSG.qUE(R); 

R  :=  R  +  1; 
if  R  >  q.SIZE  then 
if  W  <  2  then 

L0C_VAR(Z) .OUTq.MSG.TO.SEND  :=  false; 
LOC.VAR(Z).OUTq.BLOCK.WRITE  ;=  false; 
end  if; 

LOC.VAR(Z) .OUTq.RD.CNT  :=  1; 
else 

LOC.VAR(Z) .OUTq.RD.CNT  :=  R; 
end  if; 
else 

if  R  <  W  then 

MGEN  :=  LOC_VAR(Z).OUTq.MSG.qUE(R); 

R  :*  R  +  1; 
if  W  *  R  then 

L0C_VAR(Z) .OUTq.BLOCK.WRITE  :=  false; 

L0C_VAR(Z) .OUTq.MSG.TO.SEND  :=  false; 
end  if; 

LOC. VAR (Z) .OUTq.RD.CNT  :*  R; 
end  if; 
end  if ; 

MSG.PRESENT  :=  true; 
end  if; 

if  MSG.PRESENT  then 
GET.REAL_TIME(0,PT) ; 

MGEN. TOT  :=  PT; 

LOC.VAR(Z) .EVNT.CNT.OUT  :=  L0C_VAR(Z) .EVNT.CNT.OUf  +  1; 
MGEN.MSG.BODY.UNiqd)  .Sl'MBOL.VAR  ;=  L0C_VAR(Z)  .EVNT.CNT.OUT; 
W  :=  NET.q.MSG.CNT; 
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R  :=  NET.q.RD.CNT; 
if  not  NET.Q.BLOCK.WRITE  then 
if  W  >*  R  then 
NET.Q.MSG.QUECW)  :*  MGEN; 
NET.Q.MSG.TO.SEND  ;■  true; 

W  W  1; 
if  W  >  Q_SIZE  then 
if  R  <  2  then 
NET.q.BLOCK_WRITE  :=  true; 
end  if; 

NET.q.MSG.CNT  :*  1; 
else 

NET_q.MSG.CNT  :*  W; 
end  if; 
else 

if  W  <  R  then 
NET_q.MSG.qUE(W)  :»  MGEN; 
NET.q.MSG.TO.SEND  :=  true; 

W  :=  W  +  1; 
if  W  =  R  then 
NET.q. BLOCK. WRITE  :=  true; 
end  if; 

NET.q. MSG.CNT  :=  W; 
end  if; 
end  if; 
end  if; 
case  Z  is 
when  1  *> 

GET.REAL.TIME(1,PT); 

SET.C0L(F1,25); 

PUT(F1,"0.S  sending  ") ; 
case  MGEN.CNTRL.ACTION  is 
when  MKR  => 

PUT(F1,"MKR  msg."); 
when  FNON  => 

PUT(F1,"FN0N  msg."); 
when  FNOFF  ®> 

PUT(F1,"FN0FF  to  Node  #"); 
PUT(F1 ,MGEN.DEST.N0DE,1) ; 
when  STATUS  »> 

PUT(F1, "STATUS  msg."); 
when  CHKPT  -> 

PUT(F1,"CHKPT  msg."); 
when  others  => 

NULL; 
end  case; 

SET_C0L(F1,60); 

PUT(F1,"EVNT  #") ; 

PUT(F1,L0C_VAR(Z) .EVNT.CNT.0UT,4) ; 
NEW.LINE(Fl) ; 
when  2  => 
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GET.REAL.TIME (2 , PT) ; 

SET.C0L(F2,25); 

PUT(F2,"0_S  sending  "); 

case  MGEN.CNTRL.ACTION  is 
when  MKR  *> 

PUT(F2,"MKR  msg."); 
when  FNON  »> 

PUT(F2,"FN0N  msg.”); 
when  FNOFF  *> 

PUT(F2,"FNOFF  to  Node  #") ; 
PUT (F2 , MGEN . DEST.NODE , 1 ) ; 
when  STATUS  => 

PUT(F2, "STATUS  msg."); 
when  CHKPT  => 

PUT(F2,"CHKPT  msg."); 
when  others  -> 

NULL; 

end  case; 

SET_C0L(F2,60); 

PUT(F2,"EVNT  #"); 

PUT(F2,L0C_VAR(Z).EVNT.CNT  OUT, 4); 

NEW_LINE(F2) ; 
when  3  => 

GET.REAL.TIME (3, PT) ; 

SET.C0L(F3,25); 

PUT(F3,"0.S  sending  "); 

case  MGEN.CNTRL.ACTION  is 
when  MKR  *> 

PUT (F3, "MKR  msg."); 
when  FNON  => 

PUT (F3, "FNON  msg."); 
when  FNOFF  »> 

PUT(F3, "FNOFF  to  Node  #"); 
PUT (F3, MGEN. DEST  N0DE,1); 
when  STATUS  => 

PUT(F3 , "STATUS  msg . ") ; 
when  CHKPT  => 

PUT (F3, "CHKPT  msg."); 
when  others  => 

NULL; 

end  case; 

SET.C0L(F3,60); 

PUT(F3,"EVNT  #") ; 

PUT(F3,L0C.VAR(Z).EVNT  CNT  OUT, 4); 

NEW.LINE(F3) ; 
when  4  => 

GET.REAL.TIME (4, PT); 

SET.C0L(F4,25); 

PUT(F4,"0.S  sending  "); 

case  MGEN.CNTRL.ACTION  is 
when  MKR  *> 
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PUT(F4,"MKR  msg.”); 
when  FNON  » 

PUT(F4,"FN0N  msg.*'); 
when  FNOFF  -> 

PUT(F4,"FN0FF  to  Node  #"); 

PUT (F4 , MGEN . DEST.NODE , 1 ) ; 
when  STATUS  => 

PUT(F4, "STATUS  msg."); 
when  CHKPT  «> 

PUT(F4,"CHKPT  msg."); 
when  others  *> 

NULL; 
end  case; 

SET_CaL(F4,60); 

PUT(F4,"EVNT  #") ; 

PUT(F4,L0C.VAR(Z) .EVNT_CNT.0UT,4) ; 
NEW.LINE(F4) ; 
when  others  => 

NULL; 
end  case; 

end  if;  —  end  if  msg  present 

end  select; 
end  loop; 
end; 

end  OUTS ; 


with  DECLARATIONS;  use  DECLARATIONS; 
generic 

package  CKPT  is 
task  CHECK.pt  is 

entry  MARKER_MSG(M  :  in  MSG.RECORD;  NID  :  in  integer); 
entry  CHKPT_COMP(M  :  in  MSG.RECORD;  NID  :  in  integer); 
end; 

task  EVENT.CNT  is 

entry  EVNT_CNT.FULL(NID  :  in  integer); 
end; 

end  CKPT; 


with  text.io;  use  text.io; 

with  integer.io;  use  integer.io; 

with  niimber_io;use  number.io; 

with  PROCESS;  use  PROCESS; 

with  DECLARATIONS;  use  DECLARATIONS; 

with  COMMNET;  use  COMMNET; 

package  body  CKPT  is 

—  The  CHECK.PT  task  is  called  by  the  INPUT.SERVER  when  a 

—  marker  (MKR)  or  checkpoint  complete  (CHKPT)  message  is 

—  received.  This  task  calls  MKR.MSG  or  CHK.PT_CMPLT.MSG 
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—  respectfully,  for  further  processing  of  the  messages. 

task  body  CHECK.PT  is 
MGEN  :  MSG.RECORD; 

FLG  :  boolean; 

Z,W,R  :  integer; 
begin 
loop 

select 

accept  MARKER.MSG  (H:  in  MSG.RECORD ;NID:  in  integer)  do 
Z  :«  NST(NID) .NODE.ID; 

MGEN  ;■  M; 

FLG  :■  FALSE; 
case  M.CNTRL. ACTION  is 
when  MKR  ■> 

PROCESS. MKR.MSG (MGEN,  Z,  FLG); 
if  FLG  then 

W  :=  LOC.VAR(Z).OUTQ.MSG_CNT; 

R  :»  LOC.VAR(Z).OUTQ.RD_CNT; 
if  not  L0C_VAR(Z) .OUTQ.BLOCK.WRITE  then 
if  W  >=  R  then 

LOC.VAR(Z).OUTQ.MSG.QUE(W)  :=  MGEN; 
LOC.VAR(Z).OUTQ.MSG.TO_SEND  :*  true; 

W  :=  W  +  1; 
if  W  >  Q.SIZE  then 
if  R  <  2  then 

LOC. VAR (Z).OUTQ. BLOCK. VmiTE  :»  true; 
end  if; 

LOC.VAR(Z).OUTQ.MSG.CNT  :=  1; 
else 

LOC.VAR(Z).OUTQ.MSG_CNT  :=  W; 
end  if; 
else 

if  H  <  R  then 

LOC.VAR(Z).OUTQ.MSG.QUE(W)  :=  MGEN; 
LOC_VAR(Z).OUTq.MSG_TO.SEND  :=  true; 

W  :=  W  +  1; 
if  W  =  R  then 

LOC.VAR(Z). OUTQ.BLOCK.WRITE  :*  true; 
end  if; 

LOC_VAR(Z).OUTQ.MSG_CNT  :*  W; 
end  if ; 
end  if; 
end  if; 
end  if; 
when  CHKPT  => 

Z  :=  NST(NID) .NODE.ID; 

PROCESS . CHK_PT.CMPLT.MSG (M , Z) ; 
when  others  »> 
null; 
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end  case; 
end; 
or 

terminate; 
end  select; 
end  loop; 
end; 

—  The  EVENT.CNT  task  monitors  the  events  at  a  node  and  originates 

—  the  checkpoint  process  once  a  predetermined  number  of  events  has 

—  occurred. 

task  body  EVENT.CNT  is 
MGEN  :  MSG.RECORD; 

FLG  :  boolean; 

Z,W,R  :  integer; 

CNT  :  integer  :*  10; 

PT  :  float  :»  0.0; 

begin 
loop 

select 

accept  EVNT_CNT.FULL(NID  :  in  integer)  do 

Z  :*  NST(NID) .NODE.ID;  —  initialize  for  simulation 
CNT  :=  CNT  ♦  NID; 
end; 
or 

delay  33.7; 

GET.REAL_TIHE(0 ,PT) ; 
if  L0C.VAR(Z).CHKPT.0RIG  and 

PT-L0C«VAR(Z) .CHKPT.TIMER  >68.1  then 
LOC.VAR(Z) .LOCAL.CHKPT  :»  false; 

LOC.VAR(Z) .FIRST_MKR  :«  FALSE; 

LOC  VAR(Z) .CHKPT.ORIG  :*  FALSE; 
LOC.VAR(Z).CHKPT_TIMER  :*  PT; 
for  I  in  1 . .4  loop 

LOC_VAR(Z).CHKPT_TAKEN(I)  :=  0; 
end  loop; 
case  Z  is 
when  1  ■> 

GET.REAL_TIME(1,PT); 

SET.C0L(F1,72); 

PUT.MNE(F1,"CHKPT  unsuccessful.  Restarting  CHKPT"); 
when  2  => 

GET.REAL_TIME(2,PT) ; 

SET.C0L(F2,72); 

PUT.LINE(F2, "CHKPT  unsuccessful.  Restarting  CHKPT"); 
when  3  *> 

GET_REAL.TIME(3,PT); 

SET.C0L(F3,72); 

PIIT_LINE(F3, "CHKPT  unsuccessful.  Restarting  CHKPT"); 
when  4  »> 
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GET.REAL.TIME(4,PT) ; 

SET_C0L(F4.72); 

PUT.LINE(F4/'CHKPT  \insuccessful.  Restarting  CHKPT"); 
when  others  »> 

NULL; 
end  case; 
end  if; 

if  LOC.VAR(Z).EVNT_CNT  >  CUT  and 
not  LOC_VAR(Z).LOCAL_CHKPT  then 
ST(Z)  :»NST(Z); 

MGEN.ORIG.FN.NODE  :«  Z; 

MGEN.MSG.KIND  :»  control; 

MGEN.CNTRL_ACTION  :«  MKR; 

LOC_VAR(Z).EVNT.CNT  0; 

MGEN.MSG_BODY.UNiq  :»  NST(Z) .UNIQUE.SECTION(Z) ; 
L0C_VAR(Z) .LOCAL.CHKPT  :«  true; 

LOC.VARCZ) .CHKPT.TIMER  :*  PT; 

W  :»  LOC.VAR(Z) .OUTQ.MSG.CNT; 

R  :»  LOC.VARCZ) .OUTQ.RD.CNT; 
if  not  LOC.VAR(Z).OUTQ.BLOCK.WRITE  then 
if  W  >=  R  then 

LOC.VARCZ) .OUTQ.MSG.qUE(W)  :=  MGEN; 

LOC.VARCZ) .OUiq.MSG.TO.SEND  :=  true; 

W  :«  W  +  1; 
if  W  >  q.SIZE  then 
if  R  <  2  then 

LOC.VARCZ).OUTq.BLOCK.WRITE  ;=  true; 
end  if; 

LOC.VARCZ) .OUrq.MSG.CNT  :=  1; 
else 

LOC.VARCZ). OUrq.MSG.CNT  :«  W; 
end  if; 
else 

if  W  <  R  then 

LOC.VARCZ). OUrq.MSG.qUECW)  :*  MGEN; 
LOC.VARCZ). OUiq.MSG.TO.SEND  :=  true; 

W  :»  W  +  1; 
if  W  =  R  then 

LOC. VAR CZ).0UTq. BLOCK. WRITE  :=  true; 
end  if; 

LOC.VARCZ) .OUrq.MSG.CNT  :*  W; 
end  if ; 
end  if; 
end  if; 
end  if; 
end  select ; 
end  loop; 
end; 

end  CKPT; 
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with  DECLARATIONS;  use  DECLARATIONS; 

generic 

packaige  RL  is 

task  RECONF.LAYER  is 

entry  IS_MSG_IN(N  :  in  MSG.RECORD;  NID  :  in  integer); 
end; 
end  RL; 


with  text.io;  use  text.io; 

with  integer.io;  use  integer.io; 

with  nufflber_io;use  number.io; 

with  PROCESS;  use  PROCESS; 

with  DECLARATIONS;  use  DECLARATIONS; 

with  COMMNET;  use  COMMNET; 

package  body  RL  is 

“  The  RECONF.LAYER  task  is  called  by  the  INPUT.SERVER  task 

—  to  process  both  FNON  and  FNOFF  messages. 

—  It  calls  procedures  FN.ON.REC  nad  FN.OFF.REC  to  process 

—  these  types  of  messages. 

task  body  RECONF.LAYER  is 

—  specific  calls  may  need  to  pass  a  msg  back  out 

—  if  so,  set  the  —  msg  flag 

MSG.FLAG  :  boolean  FALSE; 

MGEN  :  MSG.RECORD; 

Z,C,U,R  :  integer; 
begin 
loop 

select 

—  input  server  call  R.L  with  a  msg  to  send 
accept  IS.MSG.IN  (M:  in  MSG.RECORD;  NID  :  in  integer)  do 
Z  :■  NST(NID).NODE.ID; 

MGEN  M; 

—  the  R.L  determines  whether  a  fn  needs  to  be  started  or  terminated 

—  in  the  active  fn  queue  -  it  will  notify  the  application  layer  to 

—  tidce  the  required  action 

case  M.CNTRL.ACTION  is 
when  FNON  «> 

PROCESS . FN.ON.MSGCM,  NID) ; 
when  FNOFF  ■> 

PROCESS. FN.OFF.MSG (MGEN,  Z,  MSG.FLAG); 
if  MSG.FLAG  then  —  msg  needs  to  go  to  O.S  but 

—  will  add  msg  to  out  queue 

—  to  get  processed  by  O.S 
W  :«  LOC.VAR(Z).OUTQ.MSG.CNT; 

R  ;*  LOC.VAR(Z) .OUTQ.RD.CNT; 
if  not  LOC.VAR(Z).OUTQ.BLOCK.WRITE  then 
if  W  >»  R  then 
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LOC.VAR(Z).OUTQ.MSG_qUE(W)  :«  MGEN; 
LOC.VAR(Z).OUTQ.MSG_TO.SEND  :■  true; 

W  :«  W  +  1; 
if  W  >  Q.SIZE  then 
if  R  <  2  then 

LOC_VAR(Z).OUTq.BLOCK.HRITE  :«  true; 
end  if; 

LOC_VAR(Z).OUTQ.MSG.CNT  :»  1; 
else 

LOC.VAR(Z).OUTQ.MSG.CNT  :»  W; 
end  if; 
else 

if  W  <  R  then 

LOC_VAR(Z).OUTQ.MSG.qUE(W)  :»  MGEN; 
LOC.VAR(Z).OUTq.MSG.TO_SEND  :»  true; 
W  W  ♦  1; 
if  W  ■  R  then 

LOC_VAR(Z).OUTq.BLOCK_WRITE  :*  true; 
end  if; 

LOC.VAR(Z).OUTq.MSG.CNT  :»  W; 
end  if; 
end  if; 
end  if; 

MSG.FLAG  :«  FALSE; 
end  if; 

when  others  ■> 

NULL; 
end  case; 
end; 
or 

terminate; 
end  select ; 
end  loop; 

end; 
end  RL; 


with  DECLARATIONS;  use  DECLARATIONS; 

generic 

package  SM  is 

task  STATUS.REC  is 

entry  STAT.MSG.REC(M  :  in  MSG.RECORD;  NID  ;  in  integer); 
end; 

task  STATUS.BDCST  is 

entry  STAT_BDCST_CHK(NID  :  in  integer); 
end; 
end  SM; 


with  FLOAT.INOUT;  use  FLOAT.INOUT; 
with  text.io;  use  text_io; 
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with  integer. io;  use  integer.io; 
with  nuDber_io:use  number.io; 
with  PROCESS;  use  PROCESS; 
with  DECLARATIONS;  use  DECLARATIONS; 
with  COMMNET;  use  CONMNET; 
package  body  SM  is 

—  The  STATUS.BDCST  task  generates  periodic  status  messages 

—  for  the  node.  Also  incorporated  in  this  task  is  the 

—  Timeout  routine  ,  which  implements  node  failure  detection. 

task  body  STATUS.BDCST  is 
MGEN  :  MSG.RECORD; 

FLG  :  boolean; 

SB  :  boolean  :«  false; 

Z,C,W,R  :  integer; 

PT  :  float  :■  0.0; 
begin 
loop 

select 

accept  STAT.BDCST.CHK(NID:  in  integer)  do 
Z  :»  NST(NID) .NODE.ID; 
end; 
or 

delay  15.0; 

GET.REAL.TIME(0,PT) ; 
for  I  in  1 . .4  loop 

if  NST(Z).COMMON.SECTION.NODE.STAT.LD(1,I)  «  1  and 
PT  -  LOC.VAR(Z) .TIMER(I)  >  65.0  then 
NST(Z) .C0MM0N.SECTI0N.N0DE.STAT.LD(1,I)  :=  0; 
case  Z  is 
when  1  => 

GET.REAL.TIME(1,PT); 

SET.C0L(F1,25); 

PUT(F1,"S.M  detects  FAILURE  on  Node  #"); 
PUT(F1,I,1); 

SET.C0L(F1,72); 

PUT.LINECFl, "Notify  NF  task."); 
when  2  «> 

GET.REAL.TIME ( 2 , PT) ; 

SET.C0L(F2,25); 

PUT(F2,"S.M  detects  FAILURE  on  Node  #"); 
PUT(F2,I,1); 

SET.C0L(F2,72); 

PUT.LINE(F2, "Notify  NF  task."); 
when  3  *> 

GET.REAL.TIME (3, PT) ; 

SET.C0L(F3,25); 

PUT(F3,"S.M  detects  FAILURE  on  Node  #"); 
PUT(F3,I,1); 

SET.C0L(F3,72); 
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PUT_LINE(F3, "Notify  NF  task."); 
when  4  »> 

GET_REAL.TIME(4,PT) ; 

SETT  C0L(F4  25)  * 

PUT(F4."S.M  detects  FAILURE  on  Node  #"); 
PUT(F4,I,1); 

SET_C0L(F4,72); 

PUT_LINE(F4, "Notify  NF  task."); 
when  others  ■> 

NULL; 
end  case; 
end  if; 
end  loop; 

if  NST(Z).C0MM0N.SECTI0N.N0DE_STAT.LD(1.Z)  =  1 
and  not  FAILED.NODE(Z)  then 
if  PT  -  LOC.VAR(Z)  .TIMER(Z)  >  44.0  then 
MGEN.DEST.NODE  :«  1; 

MGEN.DEST.FUNC  :*  Z; 

MGEN.CNTRL.ACTION  :*  STATUS; 

MGEN.ORIG.FN.NODE  :»  Z; 

MGEN.MSG.KIND  :«  control; 

W  :»  L0C_VAR(Z) .OUTQ.MSG.CNT; 

R  :=  L0C_VAR(Z) .OUTQ.RD.CNT; 
if  not  LOG. VAR (Z).OUTq. BLOCK. WRITE  then 
if  W  >*  R  then 

LQC.VAR(Z).OUTQ.MSG_QUE(W)  :»  MGEN; 
LOC.VAR(Z).OUTQ.MSG.TO.SEND  :»  true; 

W  :»  W  +  1; 
if  W  >  Q.SIZE  then 
if  R  <  2  then 

L0C.VAR(Z).0UTQ. BLOCK. WRITE  :*  true; 
end  if; 

LOC.VAR(Z).OUTQ.MSG.CNT  :*  1; 
gXsg 

LOC.VAR(Z).OUTQ.MSG.CNT  :=  W; 
end  if; 
else 

if  W  <  R  then 

L0C.VAR(Z).0UTQ.MSG.PUE(W)  :=  MGEN; 
LOC.VAR(Z) .OUTQ.MSG.TO.SEND  :»  true; 

W  :«  W  +  1; 
if  W  *  R  then 

LOG. VAR (Z).OUTQ. BLOCK. WRITE  ;=  true; 
end  if; 

LOC.VAR(Z) .OUrq.MSG.CNT  :*  W; 
end  if; 
end  if; 
end  if; 
end  if; 
end  if; 
end  select ; 
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end  loop; 
end; 


”  The  STATUS.REC  task  is  called  by  the  INPUT.SERVER  when  a 

—  status  message  is  received.  In  turn  this  task  calls  the 

—  STATUS.MSG  procedure  for  further  processing. 

task  body  STATUS.REC  is 
MGEN  :  MSG.RECORD; 

FLG  :  booleem; 

SB  :  boolean  :«  false; 

Z,C,tf,R  :  integer; 

PT  :  float  :■  0.0; 
begin 
loop 

select 

accept  STAT.MSG.REC  (M:in  MSG.RECORD ;NID:  in  integer)  do 
Z  NST(NID) .NODE.ID; 

MGEN  :=  M; 

FLG  :«  FALSE; 

LOC.VAR(Z) .TIMER(MGEN.ORIG.FN.NODE)  :«  M.TOT; 

PROCESS. STAT.MSG (MGEN,  Z,  FLG); 
if  FLG  then 

W  :»  LOC.VAR(Z) .OUTQ.MSG.CNT; 

R  :«  LOC.VAR(Z) .OUTQ.RD.CNT; 
if  not  LOC.VAR(Z).OUTQ.BLOCK.WRITE  then 
if  W  >«  R  then 

LOC.VAR(Z).OUTq.MSG.QUE(H)  :«  MGEN; 
LOC.VAR(Z) .OUTQ.MSG.TO.SEND  :»  true; 

W  :«  W  +  1; 
if  W  >  q.SIZE  then 
if  R  <  2  then 

LOC.VAR(Z).OUTq.BLOCK.WRITE  :=  true; 
end  if; 

LOC.VAR(Z).OUTq.MSG.CNT  :=  1; 
else 

LOC. VAR (Z). OUTQ.MSG.CNT  ;=  W; 
end  if; 
else 

if  W  <  R  then 

LOC.VAR(Z) .OUTq.MSG.qUE(W)  :»  MGEN; 
LOC.VAR(Z) .OUrq.MSG.TO.SEND  :=  true; 

W  :»  W  +  1; 
if  W  *  R  then 

LOC.VAR(Z).OUTq.BLOCK.WRITE  :»  true; 
end  if ; 

LOC.VARCZ) .OUrq.MSG.CNT  :=  W; 
end  if; 
end  if; 
end  if ; 
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end  if; 
end; 
or 

terminate; 
end  select; 
end  loop; 
end; 
end  SM; 

with  DECLARATIONS;  use  DECLARATIONS; 

package  FP  is 

task  EVENT.MAKER  is 

entry  NEW_EVENT(NID:  in  integer); 
end; 
end  FP; 


with  FLOAT. INOUT;  use  FLOAT. INOUT; 

with  text.io;  use  text.io; 

with  integer.io;  use  integer.io; 

with  nufflber.io;use  number.io; 

with  TRAND;  use  TRAND; 

with  calendar;  use  calendar; 

with  DECLARATIONS;  use  DECLARATIONS; 

with  PROCESS;  use  PROCESS; 

package  body  FP  is 

—  The  EVENT.MAKER  task  is  utilized  to  simulate  an  actual 
— distributed  processing  system. 

task  body  EVENT.MAKER  is 
MGEN.outmsg  :  MSG.RECORD; 
x,Z,W,R  :  integer; 

N  :  integer  :*  0; 

EN,0N,DN  :  integer; 

MSG.BUF.EMPTY  :  boolean  :*  false; 

MSG.PRESENT  :  boolean  :*  false; 

PT  :  float  :■  0.0; 

ST  :  DURATION  :=  63.15; 
begin  —  begin  Front. End  Processor 
loop 

select 

accept  NEW.EVENT(NID:  in  integer)  do 
Z  :=  NID; 
end; 
or 

delay  ST ; 

N  N  +  1; 

MSG.PRESENT  :*  false; 

EN  ;»  12; 

TRAND. TEST.aANDOM(EN) ; 
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EN  EN  mod  2; 
case  EN  is 
when  1  *> 

MSG.PRESENT  :*  true; 
outmsg.CNTRL.ACTION  FNOFF; 

ON  :«  4; 

TRAND.TEST_RANDOM(ON) ; — get  an  active  random  orig  node 

WHILE  NST(Z) .C0MM0N_SECTI0N.N0DE_STAT_LD(1,0N)  *  0  loop 
delay  2.0; 

ON  :»  4; 

TRAND .TEST_RAND0M(0N) ; 
end  loop;  —  end  while  loop 
outmsg.ORIG.FN.NODE  ON; 

DN  :«  4; 

TRAND. TEST_RAND0M(DN) ; — get  an  active  random  dest 
— node  that  is  not  =  to  the  orig  node 
WHILE  NST(Z) .C0MM0N_SECTI0N.N0DE.STAT.LD(1,DN)  =  0 

or  DN  «  ON  loop 

delay  2.0; 

DN  :*  4; 

TRAND . TEST.RAND0M(DN) ; 
end  loop;  —  end  while  loop 
outmsg . DEST.NODE  :=  DN; 

X  :*  1;  —  get  an  active  fn  from  orig.  node 

while  NST(Z) .C0MM0N.SECTI0N.FN_L0C(x)  /*  ON 

amd  X  <  13  loop 

X  :»  X  +  1; 
end  loop; 
if  X  <  13  then 

outmsg . DEST.FUNC  x; 

else 

MSG.PRESENT  :»  false; 
end  if ; 

outmsg.MSG.BODY.UNiq(l).REGISTER_VAL  :=  DN; 
outmsg.MSG.KIND  :=  CONTROL; 
when  0  ■> 

ON  :*  4; 

TRAND. TEST_RAND0M(0N) ; 

WHILE  NST(Z) .C0MM0N_SECTION.N0DE.STAT_LD(l,0N)*0  loop 
ON  :«  4; 

TRAND. TEST_RAND0M(0N) ; 
end  loop;  —  end  while  loop 
if  not  FAILED.NODE(ON)  then 
FAILED.NODECON)  :«  true; 
end  if; 
case  ON  is 
when  1  »> 

GET.REAL.TIMEd  ,PT) ; 

SET_C0L(F1,25); 

PUT.LINE(F1,''FP  generating  Node  FAILURE"); 
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when  2  => 

GET_REAL.TIME(2 ,PT) ; 

SET.C0L(F2,25); 

PUT_LINE(F2,"FP  generating  Node  FAILURE"); 
when  3  => 

GET_REAL_TIME(3,PT) ; 

SET.C0L(F3,25); 

PUT.LINE(F3."FP  generating  Node  FAILURE"); 
when  4  ®> 

GET.REAL_TIME(4,PT) ; 

SET_CaL(F4,25); 

PUT_LINE(F4,"FP  generating  Node  FAILURE"); 
when  others  => 

NULL; 
end  case; 

MSG.PRESENT  :*  false; 
when  others  => 
null; 
end  case; 

if  MSG.PRESENT  then 
MGEN  : -  outmsg ; 

Z  :=  MGEN.ORIG_FN.NODE; 

W  :=  LOC.VAR(Z) .OUTQ.MSG.CNT; 

R  :=  LOC.VARCZ) .OUTQ.RD.CNT; 
if  not  LOC.VARCZ) .OUTQ.BLOCK.WRITE  then 
LOC.VARCZ) .OUTQ.MSG.QUECW)  :=  MGEN; 

LOC.VARCZ) .QUTQ.MSG.TO.SEND  :=  true; 

W  :=  W  +  1; 

if  W  >  Q.SIZE  then 

LOC.VARCZ) .OUTQ.MSG.CNT  :=  1; 
end  if; 
if  W  =  R  then 

LOC.VARCZ). OUTQ.BLOCK.WRITE  :=  true; 
else 

LOC.VARCZ) .OUTQ.MSG.CNT  :=  W; 
end  if ; 
end  if ; 
end  if; 
end  select; 
end  loop; 
end; 
end  FP; 


with  text.io;  use  text.io; 

with  integer.io;  use  integer.io; 

with  number.io ;use  number.io; 

with  FLOAT.INOUT;  use  FLOAT.INOUT; 

with  calendar;  use  calendar; 

with  DECLARATIONS;  use  DECLARATIONS; 

with  PROCESS;  use  PROCESS; 
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with  COMMNET;  use  COMMNET; 
with  FP;  use  FP; 
with  OUTSl;  use  OUTSl; 
with  0UTS2;  use  0UTS3; 
with  0UTS3;  use  011783; 
with  0UTS4;  use  0UTS4; 
with  INSl;  use  INSl; 
with  INS2;  use  INS2; 
with  INS3:  use  INS3; 
with  INS4;  use  INS4: 
with  SMI;  use  SMI; 
with  SM2;  use  SM2; 
with  SH3:  use  SM3; 
with  SM4:  use  SM4: 
with  RLl;  use  RLl; 
with  RL2:  use  RL2; 
with  RL3;  use  RL3: 
with  RL4;  use  RL4; 
with  CKPTl;  use  CKPTl; 
with  CKPT2;  use  CKPT2; 
with  CKPT3;  use  CKPT3; 
with  CKPT4;  use  CKPT4; 

—  The  procedure  FEP  is  utilized  to  open  individual 

—  output  files  for  each  node.  It  also  initiates  each  node’s 

—  NST  for  simulation  purposes  and  assigns  each  task  its 

—  node  identification  number. 

procedure  FEP  is 
MGEN.outmsg  :  MSG_REC0RD; 

Z,W,R  :  integer; 

PT  :  float  :®  0.0; 

begin  —  begin  Front-End  Processor 
OPENCFl ,M0DE*>aUT_FILE,NAME=>"N0UTl") ; 

0PEN(F2  ,M0DE=>0UT_FILE,NAME=>’'N0UT2") ; 

OPEN (F3 , M0DE=>0UT_FILE , NAME=>"N0UT3" ) ; 

0PEN(F4 ,M0DE->0UT_FILE ,NAME=>"N0UT4") ; 

INSl  .NODE.INITIALIZER.BUILD.NODEd)  ; 

INS2 . NODE.INITIALIZER . BUILD.N0DE(2)  ; 

INS3 . NODE.INITIALIZER . BUILD_N0DE(3) ; 

INS4 . NODE.INITIALIZER . BUILD.N0DE(4) ; 

GET.REAL.TIME(0,PT); 
for  L  in  1 . .4  loop 

for  N  in  1..4  loop  --initialize  periodic  time  array 

— of  each  node 

LOG. VARCL). TIMER (N)  :*  PT  +  float (N  ♦  0.1); 
end  loop; 

case  L  is  —  give  identity  to  tasks  within  packages 

when  1  »> 

SMI . STATUS.BDCST . STAT.BDCST.CHK ( 1) ; 

CKPTl . EVENT.CNT . EVNT.CNT.FULL ( 1 ) ; 
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INSl . INPUT.SERVER.RECEIVE_MSG(outmsg, 1) ; 
OUTS 1 . OUTPUT.SERVER . START.OUTPUT (outmsg , 1 ) ; 
when  2  »> 

SM2 . STATUS.BDCST . STAT.BDCST.CHK (2) ; 

CKPT2 . EVENT.CNT . EVNT_CNT_FULL(2) ; 

INS2 . INPUT.SERVER . RECEIVE.MSG (outmsg , 2) ; 
0UTS2 . OUTPUT.SERVER. START_OUTPUT(outmsg, 2) ; 
when  3  «> 

SM3 . STATUS.BDCST . STAT.BDCST.CHK (3) ; 

CKPT3 . EVENT.CNT . EVNT.CNT.FULL(3) ; 

INS3 . INPUT.SERVER . RECEIVE.MSG (outmsg , 3) ; 
0UTS3 . OUTPUT.SERVER . START.OUTPUT (outmsg , 3) ; 
when  4  *> 

SM4 . STATUS.BDCST . STAT.BDCST.CHK (4) ; 

CKPT4 . EVENT.CNT . EVNT.CNT.FULL (4) ; 

INS4. INPUT.SERVER. RECEIVE.MSG (outmsg, 4) ; 
0UTS4 . OUTPUT.SERVER . START.OUTPUT (outmsg , 4) ; 
when  others  => 

NULL; 
end  case; 
end  loop; 

FP . EVENT.MAKER . NEH.EVENT ( 1) ; 
end  FEP; 
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APPENDIX  B:  SIMULATION  OUTPUT 


/♦  The  output  is  given  in  its  entirety.  The  specific  events  ♦/ 

I*  pertaining  to  this  thesis  have  been  provided  in  tinuning  */ 

I*  diagrams  listed  in  previous  chapters  */ 

/*  The  first  column  indicates  the  time  of  occurrence.  Column  two  */ 

I*  specifies  which  node  is  active,  and  column  three  indicates  what  */ 
/*  event  is  taking  place.  Column  four  designates  the  event  number  *! 
/*  of  the  node  which  sent  the  message.  The  node  which  sent  the  *! 

!*  message  is  listed  in  the  previous  column.  The  last  column,  */ 

/*  which  appears  on  a  new  line,  explains  what  action  is  done  at  */ 

/♦  the  active  node  (col\imn  two).  ♦/ 


39429.76000 

Node 

«1 

0_S  sending  STATUS  msg. 

EVNT  # 

1 

39432.64000 

Node 

«1 

S_M  rcvd  PERIODIC  from  Node  #1 
Reset  Timer  element  of  Node  #1 

EVNT  # 

1 

39435.37000 

Node 

«1 

S_M  rcvd  PERIODIC  from  Node  #2 
Reset  Timer  element  of  Node  #2 

EVNT  # 

1 

39438.11000 

Node 

«1 

S.M  rcvd  PERIODIC  from  Node  #3 
Reset  Timer  element  of  Node  *3 

EVNT  » 

1 

39440.85000 

Node 

«1 

S.M  rcvd  PERIODIC  from  Node  #4 
Reset  Timer  element  of  Node  #4 

EVNT  # 

1 

39450.88000 

Node 

«1 

FP  generating  Node  FAILURE 

39492.55000 

Node 

«1 

S.M  rcvd  PERIODIC  from  Node  #3 
Reset  Timer  element  of  Node  #3 

EVNT  # 

2 

39495 . 29000 

Node 

«1 

S.M  rcvd  PERIODIC  from  Node  #4 
Reset  Timer  element  of  Node  #4 

EVNT  # 

2 

39498.03000 

Node 

»1 

S.M  rcvd  PERIODIC  from  Node  #2 
Reset  Timer  element  of  Node  #2 

EVNT  # 

2 

39503.76000 

Node 

»1 

S.M  detects  FAILURE  on  Node  #1 
Notify  NF  task. 

39551.09000 

Node 

«1 

S.M  rcvd  PERIODIC  from  Node  #3 
Reset  Timer  element  of  Node  #3 

EVNT  # 

3 

39552.63000 

Node 

«1 

O.S  sending  STATUS  msg. 

EVNT  # 

2 

39553.81000 

Node 

»1 

S.M  rcvd  PERIODIC  from  Node  #4 
Reset  Timer  element  of  Node  #4 

EVNT  # 

4 

39556.53000 

Node 

«1 

S.M  rcvd  PERIODIC  from  Node  #2 
Reset  Timer  element  of  Node  *2 

EVNT  # 

4 

39559.25000 

Node 

«1 

S.M  rcvd  APERIODIC  from  Node  #1 
This  is  the  recovering  node. 

EVNT  # 

2 

39561.97000 

Node 

*1 

S.M  rcvd  APERIODIC  from  Node  #3 
This  is  the  recovering  node. 

EVNT  # 

4 

39564.69000 

Node 

«1 

S.M  rcvd  APERIODIC  from  Node  #4 
This  is  the  recovering  node. 

EVNT  # 

5 

39567.41000 

Node 

«1 

S.M  rcvd  APERIODIC  from  Node  #2 

EVNT  # 

5 

Recovery  complete  -  send  PERIODIC 

msg. 

39567.99000 

Node 

«1 

O.S  sending  STATUS  msg. 

EVNT  # 

3 
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39570.13000 

Node 

«1 

39587.19000 

Node 

«1 

39590.53000 

Node 

«1 

39593.25000 

Node 

«1 

39594.87000 

Node 

»1 

39595.97000 

Node 

«1 

39598.69000 

Node 

«1 

39598.71000 

Node 

«1 

39600.05000 

Node 

«1 

39602.77000 

Node 

«1 

39605.49000 

Node 

«1 

39610.93000 

Node 

«1 

39625.58000 

Node 

«1 

39625.89000 

Node 

#1 

39628.61000 

Node 

«1 

39631.33000 

Node 

#1 

S_M  rcvd  PERIODIC  from  Node  #1  EVNT  « 
Reset  Timer  element  of  Node  #1 
□_S  sending  MKR  msg.  EVNT  # 

C_P  rcvd  MKR  from  Node  #1  EVNT  # 

I  originated  CHKPT.  Not  all  MKRs  yet  rcvd. 
C.P  rcvd  MKR  from  Node  #3  EVNT  « 

I  originated  CHKPT.  Not  all  MKRs  yet  rcvd. 
O.S  sending  FNOFF  to  Node  #2  EVNT  « 

C.P  rcvd  MKR  from  Node  #4  EVNT  # 

I  originated  CHKPT.  Not  all  MKRs  yet  rcvd. 
C.P  rcvd  MKR  from  Mode  #2  EVNT  « 

MKRs  rcvd  from  all  nodes.  Send  CHKPT.COMP 
O.S  sending  CHKPT  msg.  EVNT  « 

R.L  rcvd  FN.OFF  from  Node  #1  EVNT  « 

No  further  action  required  ATT. 

C.P  rcvd  CHKPT  from  Node  il  EVNT  » 

CHKPT  orig.  Global  CHKPT  complete  store  NST 

R. L  rcvd  FN.ON  from  Node  #2  EVNT  « 

I  am  the  deactivating  node  and  changing  NST 
223212341234 

S. M  rcvd  PERIODIC  from  Node  #3  EVNT  # 
Reset  Timer  element  of  Node  #3 

O.S  sending  STATUS  msg.  EVNT  « 

S.M  rcvd  PERIODIC  from  Node  #1  EVNT  # 

Reset  Timer  element  of  Node  #1 

S.M  rcvd  PERIODIC  from  Node  #4  EVNT  » 

Reset  Timer  element  of  Node  #4 

S.M  rcvd  PERIODIC  from  Node  #2  EVNT  » 

Reset  Timer  element  of  Node  #2 


3 

4 

4 

5 

5 

6 

6 

6 

5 

6 
7 

6 

7 

7 

7 

8 


39429.76000 

Node 

«2 

0_S  sending  STATUS  msg. 

EVNT  « 

1 

39432.66000 

Node 

»2 

S.M  rcvd  PERIODIC  from  Node 

»1 

EVNT  « 

1 

Reset  Timer  element  of  Node 

»1 

39435.39000 

Node 

«2 

S.M  rcvd  PERIODIC  from  Node 

»2 

EVNT  # 

1 

Reset  Timer  element  of  Node 

«2 

39438.13000 

Node 

«2 

S.M  rcvd  PERIODIC  from  Node 

»3 

EVNT  « 

1 

Reset  Timer  element  of  Node 

»3 

39440.87000 

Node 

«2 

S.M  rcvd  PERIODIC  from  Node 

»4 

EVNT  » 

1 

Reset  Timer  element  of  Node 

«4 

39491.22000 

Node 

«2 

O.S  sending  STATUS  msg. 

EVNT  « 

2 

39492.57000 

Node 

»2 

S.M  rcvd  PERIODIC  from  Node 

«3 

EVNT  # 

2 

Reset  Timer  element  of  Node 

«3 

39495.31000 

Node 

«2 

S.M  rcvd  PERIODIC  from  Node 

«4 

EVNT  « 

2 

Reset  Timer  element  of  Node 

«4 

39498.05000 

Node 

«2 

S.M  rcvd  PERIODIC  from  Node 

*2 

EVNT  « 

2 

Reset  Timer  element  of  Node 

#2 

39503.76000 

Node 

«2 

S.M  detects  FAILURE  on  Node  #1 

Notify  NF  task. 

39523.90000 

Node 

«2 

R.L  rcvd  FN.OFF  from  Node  #4 

EVNT  « 

3 
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FN.ON  sent  to  activate  FN  i  4 

39525.78000  Node  #2  0_S  sending  FNON  msg.  EVNT  #  3 

39528.00000  Node  #2  R_L  rcvd  FN.ON  from  Node  #2  EVNT  i  3 

I  am  the  activating  node  and  changing  NST. 
123212341234 

39548.80900  Node  «2  O.S  sending  STATUS  msg.  EVNT  *  4 

39551.17900  Node  #2  S.M  rcvd  PERIODIC  from  Node  #3  EVNT  #  3 

Reset  Timer  element  of  Node  *3 

39553.91000  Node  #2  S.M  rcvd  PERIODIC  from  Node  #4  EVNT  i  4 

Reset  Timer  element  of  Node  #4 

39556.64000  Node  «2  S.M  rcvd  PERIODIC  from  Node  #2  EVNT  *  4 

Reset  Timer  element  of  Node  #2 

39559.37000  Node  #2  S.M  rcvd  APERIODIC  from  Node  #1  EVNT  «  2 

Sending  APERIODIC  vith  NST  unique  sections. 

39560.32000  Node  #2  O.S  sending  STATUS  msg.  EVNT  «  5 

39562.11000  Node  #2  S.M  rcvd  APERIODIC  from  Node  #3  EVNT  «  4 

APERIODIC  response  already  sent,  no  action. 

39564.84000  Node  #2  S.M  rcvd  APERIODIC  from  Node  #4  EVNT  «  5 

APERIODIC  response  already  sent,  no  action. 

39567.57000  Node  #2  S.M  rcvd  APERIODIC  from  Node  #2  EVNT  #  5 

APERIODIC  response  already  sent,  no  action. 

39570.30000  Node  #2  S.M  rcvd  PERIODIC  from  Node  #1  EVNT  «  3 

Reset  Timer  element  of  Node  #1 

39590.71000  Node  #2  C.P  rcvd  MKR  from  Node  #1  EVNT  #  4 

Local  CHKPT  already  conducted.  Store  UNIQ. 

39591.04000  Node  #2  O.S  sending  MKR  msg.  EVNT*  6 

39593.44000  Node  #2  C.P  rcvd  MKR  from  Node  #3  EVNT  *  5 

Local  CHKPT  already  conducted.  Store  UNIQ. 

39596 . 17000  Node  #2  C.P  rcvd  MKR  from  Node  #4  EVNT  «  6  ^ 

Local  CHKPT  already  conducted.  Store  UNIQ. 

39597.54000  Node  #2  C.P  rcvd  MKR  from  Node  #2  EVNT  *  6 

Local  CHKPT  already  conducted.  Store  UNIQ.  * 

39600.27000  Node  #2  R.L  rcvd  FN.OFF  from  Node  #1  EVNT  »  5 

FN.ON  sent  to  activate  FN  #  1 

39602.54000  Node  #2  O.S  sending  FNON  msg.  EVNT  *  7 

39603.00000  Node  #2  C.P  rcvd  CHKPT  from  Node  #1  EVNT  #  6 

Global  CHKPT  complete  store  NST 

39605.74000  Node  #2  R.L  rcvd  FN.ON  from  Node  #2  EVNT  #  7 

I  am  the  activating  node  and  changing  NST. 
223212341234 

39611.20000  Node  #2  S.M  rcvd  PERIODIC  from  Node  *3  EVNT  »  6 

Reset  Timer  element  of  Node  *3 

39625.59000  Node  #2  O.S  sending  STATUS  msg.  EVNT  *  8 

39626.17000  Node  #2  S.M  rcvd  PERIODIC  from  Node  #1  EVNT  *  7 

Reset  Timer  element  of  Node  #1 

39628.90000  Node  #2  S.M  rcvd  PERIODIC  from  Node  #4  EVNT  «  7 

Reset  Timer  element  of  Node  #4 

39631.63000  Node  #2  S.M  rcvd  PERIODIC  from  Node  #2  EVNT  *  8 

Reset  Timer  element  of  Node  *2 
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39429.77000  Node  #3  0_S  sending  STATUS  msg.  EVNT  «  1 

39432.65000  Node  #3  S_M  rcvd  PERIODIC  from  Node  #1  EVNT  «  1 

Reset  Timer  element  of  Node  il 

39435.37900  Node  #3  S.M  rcvd  PERIODIC  from  Node  #2  EVNT  «  1 

Reset  Timer  element  of  Node  *2 

39438.12000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #3  EVNT#  1 

Reset  Timer  element  of  Node  *3 

*  39440.86000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #4  EVNT  «  1 

Reset  Timer  element  of  Node  #4 

39491.19000  Node  #3  O.S  sending  STATUS  msg.  EVNT  f  2 

39492.56000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #3  EVNT  «  2 

Reset  Timer  element  of  Node  #3 

39495.30000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #4  EVNT  #  2 

Reset  Timer  element  of  Node  #4 

39498.04000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #2  EVNT  #  2 

Reset  Timer  element  of  Node  #2 
39503.76900  Node  #3  S.M  detects  FAILURE  on  Node  #1 

Notify  NF  task. 

39523.89000  Node  #3  R.L  rcvd  FN.OFF  from  Node  #4  EVNT  «  3 

No  further  action  required  ATT. 

39527.99000  Node  #3  R.L  rcvd  FN.ON  from  Node  #2  EVNT  #  3 

Neither  act/deact  node  and  changing  NST. 
123212341234 

39548.80000  Node  #3  O.S  sending  STATUS  msg.  EVNT  i  3 

39551.16000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #3  EVNT  #  3 

Reset  Timer  element  of  Node  #3 

39553.90000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #4  EVNT  #  4 

^  Reset  Timer  element  of  Node  *4 

39556.63000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #2  EVNT  #  4 

Reset  Timer  element  of  Node  #2 

.  39559.36000  Node  #3  S.M  rcvd  APERIODIC  from  Node  #1  EVNT  #  2 

Sending  APERIODIC  with  NST  unique  sections. 
39560.31000  Node  #3  O.S  sending  STATUS  msg.  EVNT  i  4 

39562.10000  Node  #3  S.M  rcvd  APERIODIC  from  Node  #3  EVNT#  4 

APERIODIC  response  already  sent,  no  action. 
39564.83000  Node  #3  S.M  rcvd  APERIODIC  from  Node  #4  EVNT  #  5 

APERIODIC  response  already  sent,  no  action. 

39567.56000  Node  #3  S.M  rcvd  APERIODIC  from  Node  #2  EVNT  #  5 

APERIODIC  response  already  sent,  no  action. 
39570.29000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #1  EVNT  «  3 

Reset  Timer  element  of  Node  #1 

39590.70000  Node  #3  C.P  rcvd  MKR  from  Node  #1  EVNT  #  4 

Local  CHKPT  already  conducted.  Store  UNIQ. 
39591.03000  Node  #3  O.S  sending  MKR  msg.  EVNT  #  5 

39593.43000  Node  #3  C.P  rcvd  MKR  from  Node  #3  EVNT  #  5 

Local  CHKPT  already  conducted.  Store  UNIQ. 

•  39596 . 16000  Node  #3  C.P  rcvd  MKR  from  Node  #4  EVNT  «  6 

Local  CHKPT  already  conducted.  Store  UNIQ. 
39597.53000  Node  #3  C.P  rcvd  MKR  from  Node  #2  EVNT  #  6 

%  Local  CHKPT  already  conducted.  Store  UNIQ. 
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EVNT  #  5 


39600.26000  Node  #3  R.L  rcvd  FN.OFF  from  Node  #1 

No  further  action  required  ATT. 

39602.99000  Node  #3  C.P  rcvd  CHKPT  from  Node  il  EVNT  •  6 

Global  CHKPT  complete  store  NST 

39605.73000  Node  #3  R.L  rcvd  FN.ON  from  Node  #2  EVNT  #  7 

Neither  act/deact  node  and  changing  NST. 
223212341234 

39610.22000  Node  #3  O.S  sending  STATUS  msg.  EVNT  #  6 

39611.19000  Node  #3  S.M  rcvd  PERIODIC  from  Node  #3  EVNT  #  6 

Reset  Timer  element  of  Node  #3 

39626.16000  Node  «3  S.M  rcvd  PERIODIC  from  Node  «1  EVNT  «  7 

Reset  Timer  element  of  Node  *1 

39628.89000  Node  #3  S.M  rcvd  PERIODIC  from  Node  «4  EVNT  «  7 

Reset  Timer  element  of  Node  #4 

39631.62000  Node  »3  S.M  rcvd  PERIODIC  from  Node  «2  EVNT  i  8 

Reset  Timer  element  of  Node  #2 


39429.78000 

Node 

«4 

O.S  sending  STATUS  msg. 

EVNT  « 

1 

39432.66000 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #1 
Reset  Timer  element  of  Node  #1 

EVNT  « 

1 

39435.38000 

Node 

#4 

S.M  rcvd  PERIODIC  from  Node  #2 
Reset  Timer  element  of  Node  *2 

EVNT  # 

1 

39438.12000 

Node 

#4 

S.M  rcvd  PERIODIC  from  Node  #3 
Reset  Timer  element  of  Node  #3 

EVNT  » 

1 

39440.86000 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #4 
Reset  Timer  element  of  Node  #4 

EVNT  i 

1 

39491.22000 

Node 

#4 

O.S  sending  STATUS  msg. 

EVNT  # 

2 

39492.56000 

Node 

i 

S.M  rcvd  PERIODIC  from  Node  #3 
Reset  Timer  element  of  Node  #3 

EVNT  # 

2 

39495.30000 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #4 
Reset  Timer  element  of  Node  *4 

EVNT  # 

2 

39498 . 04000 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #2 
Reset  Timer  element  of  Node  #2 

EVNT  # 

2 

39503.77000 

Node 

#4 

S.M  detects  FAILURE  on  Node  #1 
Notify  NF  task. 

39521.94000 

Node 

«4 

O.S  sending  FNOFF  to  Node  #2 

EVNT  i 

3 

39523.90000 

Node 

«4 

R.L  rcvd  FN.OFF  from  Node  #4 

No  further  action  required  ATT. 

EVNT  « 

3 

39528.00000 

Node 

«4 

R.L  rcvd  FN.ON  from  Node  #2 

EVNT  # 

3 

I  am  the  deactivating  node  and  changing  NST 

12321234123 

4 

39548.80000 

Node 

«4 

O.S  sending  STATUS  msg. 

EVNT  # 

4 

39551.17000 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #3 
Reset  Timer  element  of  Node  #3 

EVNT  # 

3 

39553.90900 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #4 
Reset  Timer  element  of  Node  #4 

EVNT  « 

4 

0 

3955^.63900 

Node 

«4 

S.M  rcvd  PERIODIC  from  Node  #2 
Reset  Timer  element  of  Node  #2 

EVNT  # 

4 

4 
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S_M  rcvd  APERIODIC  from  Node  #1  EVNT  #  2 

Sending  APERIODIC  with  NST  unique  sections. 
O.S  sending  STATUS  msg.  EVNT  «  5 

S.M  rcvd  APERIODIC  from  Node  #3  EVNT  i  4 

APERIODIC  response  already  sent,  no  action. 
S.M  rcvd  APERIODIC  from  Node  #4  EVNT  #  5 

APERIODIC  response  already  sent,  no  action. 
S.M  rcvd  APERIODIC  from  Node  #2  EVNT  «  5 

APERIODIC  response  already  sent,  no  action. 
S.M  rcvd  PERIODIC  from  Node  #1  EVNT  «  3 

Reset  Timer  element  of  Node  il 
C.P  rcvd  MKR  from  Node  #1 
Local  CHKPT  already  conducted. 

O.S  sending  MKR  msg.  EVNT  «  6 

C.P  rcvd  MKR  from  Node  #3  EVNT  #  5 

Local  CHKPT  already  conducted. 

C.P  rcvd  MKR  from  Node  #4 
Local  CHKPT  already  conducted. 

C.P  rcvd  MKR  from  Node  #2 
Local  CHKPT  already  conducted. 

R.L  rcvd  FN.OFF  from  Node  #1 
No  further  action  required  ATT. 

C.P  rcvd  CHKPT  from  Node  #1 
Global  CHKPT  complete  store  NST 

R. L  rcvd  FN.ON  from  Node  #2 
Neither  act/deact  node  and  changing  NST. 
223212341234 

S. M  rcvd  PERIODIC  from  Node  #3  EVNT  «  3 

Reset  Timer  element  of  Node  #3 

O.S  sending  STATUS  msg.  EVNT  #  7 

S.M  rcvd  PERIODIC  from  Node  #1  EVNT  #  7 


EVNT  # 
Store  UNiq. 
EVNT  « 
EVNT  # 
Store  UNIQ. 

EVNT  # 
Store  UNIQ. 

EVNT  # 
Store  UNIQ. 
EVNT  i 

EVNT  # 

EVNT  # 


39559.37000 

Node 

«4 

39560.31000 

Node 

«4 

39562.10900 

Node 

«4 

39564.84000 

Node 

«4 

39567.57000 

Node 

«4 

39570.29900 

Node 

«4 

39590.70000 

Node 

#4 

39591.03000 

Node 

«4 

39593.43000 

Node 

#4 

39596.16000 

Node 

•4 

39597.53000 

Node 

»4 

39600.26000 

Node 

•4 

39602.99900 

Node 

«4 

39605.74000 

Node 

#4 

39611.19900 

Node 

#4 

39625.58000 

Node 

«4 

39626.17000 

Node 

»4 

39628.90000 

Node 

«4 

39631.62900 

Node 

»4 

Reset  Timer  element  of 
S.M  rcvd  PERIODIC  from 
Reset  Timer  element  of 
S.M  rcvd  PERIODIC  from 
Reset  Timer  element  of 


Node 

Node 

«1 

«4 

EVNT  f 

7 

Node 

Node 

#4 

#2 

EVNT  # 

8 

Node 

«2 

% 


83 


REFERENCES 

1.  Kleinrock  L.,  “Distributed  Systems,”  Communications  of  the  ACM,  Vol  28,  No. 
11,  NOV  1985. 

2.  Deitel  H.M.,  Operating  Systems,  pp.  500-550,  Addison- Wesley  Co.,  1990. 

3.  Mullender  S.  and  others.  Distributed  Systems,  pp  319-357,  Addison-Wesley  Co., 
1990. 

4.  Koo  R.,  Toueg  S.,  “Checkpointing  and  Rollback- Recovery  for  Distributed  Sys¬ 
tems,” Transactions  on  Software  Engineering,  Vol.  S^13,  No.  1,  JAN  1987. 

5.  Lala  J.H.,  Harper  R.E.,  Alger  L.S.,  “A  Design  Approach  for  Ultrareliable  Real- 
Time  Systems,”  Computer,  Vol  24,  No.  5,  MAY  1991. 

6.  Bhargava  B.,  Lian  S.,  “Independent  Checkpointing  and  Concurrent  Rollback  for 
Recovery  in  Distributed  Systems  -  An  Optimistic  Approach,”  Proc.  of  7th  Symp. 
on  Reliable  Distributed  Systems,  1988. 

7.  Shukla  S.,  Yang  C.,  Puett  R.,  Lehman  K.,  Masters  M.,  “A  Framework  for  Node 
Failure/Repair  Transparency  in  Distributed  Real-time  Systems,”  paper  submit¬ 
ted  to  the  Fault  Tolerant  Computing  International  Symposium,  Boston^MA.  1992. 

8.  Lehman  K.,  Function  Allocation  in  a  Robust  Distributed  Real-Time  Environment, 
Master’s  Thesis,  Naval  Postgraduate  School,  Monterey,  California,  DEC  1991. 


% 


84 


INITIAL  DISTRIBUTION  LIST 


1.  Defense  Technical  Information  Center  2 

Cameron  Station 

Alexandria,  VA  22304-6145 

2.  Library,  Code  52  2 

Naval  Postgraduate  School 

Monterey,  CA  93943-5000 

3.  Chairman,  Code  EC  1 

Department  of  Electrical  and  Computer  Engineering 

Naval  Postgraduate  School 
Monterey,  CA  93943-5000 

4.  Professor  Shridhar  B.  Shukla,  Code  EC/Sh  1 

Department  of  Electrical  and  Computer  Engineering 

Naval  Postgraduate  School 
Monterey,  CA  93943-5000 

5.  Professor  Chyan  Yang,  Code  EC/Ya  1 

Department  of  Electrical  and  Computer  Engineering 

Naval  Postgraduate  School 
Monterey,  CA  93943-5000 

6.  Commanding  Officer  1 

Supervisor  of  Shipbuilding 


85 


Conversion  and  Repair,  USN 
Pascagoula,  MS  39568-2210 

Michael  W.  Masters,  Code  N35 
Naval  Surface  Warfare  Center 
Dahlgren,  VA  22448-5000 


